models

/

Nemotron-Mini-Instruct-4B

Version:

4B GPU:

PyTorch

This version is not quantized and a GPU is recommended.

  1. Install our magic package manager:

    curl -ssL https://magic.modular.com/ | bash

    Then run the source command that's printed in your terminal.

  2. Install Max Pipelines in order to run this model.

    magic global install max-pipelines && magic global update
  3. Start a local endpoint for Nemotron-Mini-Instruct/4B:

    max-pipelines serve --huggingface-repo-id=nvidia/Nemotron-Mini-4B-Instruct

    The endpoint is ready when you see the URI printed in your terminal:

    Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
  4. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
        "model": "nvidia/Nemotron-Mini-4B-Instruct",
        "stream": true,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the World Series in 2020?"}
        ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
    ' | sed 's/\n//g'
  5. 🎉 Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

Deploy this model to cloud

DETAILS

ChatMODEL CLASS
PyTorch
HARDWARE
GPU
QUANTIZATION
ARCHITECTURE
PyTorch

MAX GITHUB

Modular / MAX

MODEL

nvidia

nvidia/Nemotron-Mini-4B-Instruct

QUESTIONS ABOUT THIS MODEL?

Leave a comment

PROBLEMS WITH THE CODE?

File an Issue

TAGS

transformers

/

pytorch

/

nemo

/

nemotron

/

text-generation

/

nvidia

/

llama-3

/

conversational

/

en

/

arxiv:2402.16819

/

arxiv:2407.14679

/

license:other

/

autotrain_compatible

/

endpoints_compatible

/

region:us

Resources & support for
running Nemotron-Mini-Instruct-4B

@ Copyright - Modular Inc - 2025