Nemotron-Mini-4B-Instruct

Model Overview

Nemotron-Mini-4B-Instruct is built for generating responses tailored for roleplaying, retrieval augmented generation, and function calling. It's a streamlined language model (SLM) enhanced for speed and on-device deployment through distillation, pruning, and quantization. The model, a derivative of the pruned and distilled Nemotron-4 15B and fine-tuned version of Minitron-4B-Base, can handle a context length of 4,096 tokens. Released for commercial applications, it's explicitly optimized for English language tasks.

Model applications, integration details, and demonstrations for NVIDIA AI Inference Manager are available through various resources including their official blog, demos, and downloadable model checkpoints.

Model Developer: NVIDIA
Training Period: February 2024 - August 2024

Model License

The model operates under the NVIDIA Community Model License.

Model Architecture

Designed with efficiency in mind, Nemotron-Mini-4B-Instruct utilizes:

Model embedding size: 3072
Attention heads: 32
MLP intermediate dimension: 9216
Techniques like Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).

Architecture Type: Transformer Decoder
Network Architecture: Nemotron-4

Prompt Format

Single Turn

System
{system prompt}

User
{prompt}
Assistant\n

Tool Use

System
{system prompt}

 ... 
 ... 

User
{prompt}
Assistant
 ... 
Tool
{tool response}
Assistant\n

Usage

The model can be utilized using programming libraries for text generation and requires manual association between tokenizer objects and text generation functions. Here's a sample structure for using AutoTokenizer and text generation functions.

AI Safety Efforts

Nemotron-Mini-4B-Instruct went through rigorous AI safety evaluations, which included:

Garak: Automated vulnerability scanning for weaknesses like prompt injection and data leakage.
AEGIS: A dataset and LLM classifier model evaluating content safety across 13 critical risk categories.
Human Content Red Teaming: Direct human interactions assessing the model's response quality.

Limitations

The model draws from internet-sourced data, which includes potential biases and toxic language. This may lead to biased or harmful outputs, particularly with inappropriate prompts. The model's performance can degrade without using a recommended prompt template and may provide inaccurate or socially inappropriate responses.

Ethical Considerations

NVIDIA champions Trustworthy AI and provides strategic policies to ensure AI applications align with ethical standards. Developers using this model must ensure compliance with pertinent industry ethics and misuse prevention guidelines. Ethical concerns or security vulnerabilities can be reported directly to NVIDIA through their support channels.

Citations