Version:

8B GPU: BF16

MAX Model

This version is not quantized and a GPU is recommended.

Install our magic package manager:
```
curl -ssL https://magic.modular.com/ | bash
```
Then run the source command that's printed in your terminal.

Install Max Pipelines in order to run this model.

magic global install max-pipelines && magic global update

Start a local endpoint for Ministral-Instruct-2410/8B:

max-pipelines serve --huggingface-repo-id=mistralai/Ministral-8B-Instruct-2410

The endpoint is ready when you see the URI printed in your terminal:

Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)

Now open another terminal to send a request using curl:

curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "mistralai/Ministral-8B-Instruct-2410",
    "stream": true,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
}' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
' | sed 's/\n//g'

🎉 Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

Deploy this model to cloud

Ministral-8B-Instruct-2410

The Ministral-8B-Instruct-2410 is a cutting-edge language model designed to optimize performance for various on-device computing tasks. As part of the Ministraux series, along with Ministral 3B, this model is built under the Mistral Research License to ensure its use aligns with specific research and academic purposes.

Key Features

License: Distributed under the Mistral Research License. Commercial licensing inquiries can be addressed via their contact page.
Advanced Training: Utilizes a 128k context window with interleaved sliding-window attention.
Multilingual and Code Robustness: Trained on significant multilingual and code datasets.
Functional Capabilities: Supports function calling and has a vocabulary size of 131k, thanks to the V3-Tekken tokenizer.

Template

Basic instruct template:

[INST]user message[/INST]assistant response[INST]new user message[/INST]

Architecture Highlights

Feature	Value
Architecture	Dense Transformer
Parameters	8,019,808,256
Layers	36
Heads	32
Dim	4096
KV Heads (GQA)	8
Hidden Dim	12288
Head Dim	128
Vocab Size	131,072
Context Length	128k
Attention Pattern	Ragged (128k,32k,32k,32k)

Benchmarks and Performance

Knowledge & Commonsense

Outperforms other models in MMLU, AGIEval, Winogrande, Arc-c, and TriviaQA benchmarks.

Model	MMLU	AGIEval	Winogrande	Arc-c	TriviaQA
Ministral 8B Base	65.0	48.3	75.3	71.9	65.5

Code & Math

Demonstrates exceptional proficiency in HumanEval pass@1 and GSM8K maj@8.

Model	HumanEval pass@1	GSM8K maj@8
Ministral 8B Base	34.8	64.5

Multilingual Proficiency

Excels in French, German, and Spanish MMLU benchmarks.

Model	French MMLU	German MMLU	Spanish MMLU
Ministral 8B Base	57.5	57.4	59.6

Instruct Models

Chat/Arena (GPT-4o Judge): Notable performance in MTBench, Arena Hard, and Wild Bench.

Model	MTBench	Arena Hard	Wild bench
Ministral 8B Instruct	8.3	70.9	41.3

Usage Recommendations

vLLM Library (Recommended)

Ensure model compatibility with the latest vLLM library for robust inference pipelines. The current context size cap is 32k in vLLM due to ongoing developments in attention kernel support for paged attention.

Installation

Upgrade to the latest packages for optimal performance:

pip install --upgrade vllm mistral_common

Contributions

The model has been developed by a diverse team of experts, each bringing a wealth of knowledge in artificial intelligence and machine learning to the table.

Citations

For full details, please refer to the provided citations.
Mistral Research License: License Link
Mistral Contact: Contact Page
Mistral Privacy Policy: Privacy Policy

Metadata

architectures.0	MistralForCausalLM
model_type	mistral

Version: 8B GPU BF16

This code works on compatible Linux machines.
We are actively working on enabling MAX Serve for MacOS ARM64 as well.

You can quickly deploy Ministral-Instruct-2410-8B to an endpoint using our MAX container. It includes the latest version of MAX with GPU support and our Python-based inference server called MAX Serve.

With the following Docker command, you’ll get an OpenAI-compatible endpoint running Ministral-Instruct-2410-8B:

docker run --gpus 1 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_HUB_ENABLE_HF_TRANSFER=1" \
    --env "HF_TOKEN=" \
    -p 8000:8000 \
    docker.modular.com/modular/max-openai-api:nightly \
    --huggingface-repo-id mistralai/Ministral-8B-Instruct-2410

In order to download the model from Hugging Face, you just need to fill in the HF_TOKEN value with your access token, unless the model is from https://huggingface.co/modularai.

Learn more

For more information about the container image, see the MAX container documentation.

To learn more about how to deploy MAX to the cloud, check out our MAX Serve tutorials.

LICENSE: mrl

DETAILS

ChatMODEL CLASS

MAX Model

MAX Models are popular open-source models converted to MAX’s native graph format. Anything with the label is either SOTA or being worked on. Learn more about MAX Models.

Browse all MAX Models

HARDWARE

GPU

QUANTIZATION

BF16

ARCHITECTURE

MAX Model

MAX GITHUB

Modular / MAX

MODEL

mistralai

mistralai/Ministral-8B-Instruct-2410

QUESTIONS ABOUT THIS MODEL?

Resources & support for
running Ministral-Instruct-2410-8B

Browse 27+ Tutorials

View Tutorials

Get help using MAX

Modular Forum

Read Documentation

Go to Docs

Ministral-Instruct-2410-8B

Ministral-8B-Instruct-2410

Key Features

Template

Architecture Highlights

Benchmarks and Performance

Usage Recommendations

Contributions

Citations

Metadata

Learn more

Resources & support for running Ministral-Instruct-2410-8B

Resources & support for
running Ministral-Instruct-2410-8B