mistral-nemo-12b

MAX Model

1 versions

A state-of-the-art 12B model with 128k context length, built by Mistral AI in collaboration with NVIDIA.

Run this model

  1. Install our magic package manager:

    curl -ssL https://magic.modular.com/ | bash

    Then run the source command that's printed in your terminal.

  2. Install Max Pipelines in order to run this model.

    magic global install max-pipelines
  3. Start a local endpoint for mistral-nemo/12b:

    max-serve serve --huggingface-repo-id mistralai/Mistral-Nemo-Instruct-2407

    The endpoint is ready when you see the URI printed in your terminal:

    Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
  4. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
        "model": "mistral-nemo/12b",
        "stream": true,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the World Series in 2020?"}
        ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
    ' | sed 's/\n/
    /g'
  5. 🎉 Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

About

Mistral NeMo is a cutting-edge 12-billion-parameter AI model, developed in collaboration with NVIDIA. Designed for high performance, it distinguishes itself with a large context window of up to 128k tokens, enabling enhanced long-form understanding and processing. The model excels in reasoning, possesses comprehensive world knowledge, and demonstrates exceptional coding accuracy, standing as a leader in its size category.

Built upon a standard transformer architecture, Mistral NeMo offers ease of integration, making it a versatile, drop-in upgrade for systems currently utilizing the Mistral 7B model. Its adaptability and state-of-the-art performance make it an appealing choice for a variety of AI applications, from conversational interfaces to advanced coding tasks.

nemo-base-performance.png

Reference

Blog

Hugging Face

DETAILS

MODEL CLASS
MAX Model

MAX Models are extremely optimized inference pipelines to run SOTA performance for that model on both CPU and GPU. For many of these models, they are the fastest version of this model in the world.

Browse 18+ MAX Models

MODULAR GITHUB

Modular

CREATED BY

mistralai

MODEL

mistralai/Mistral-Nemo-Instruct-2407

TAGS

autotrain_compatible
base_model:finetune:mistralai/Mistral-Nemo-Base-2407
base_model:mistralai/Mistral-Nemo-Base-2407
conversational
de
en
endpoints_compatible
es
fr
it
ja
license:apache-2.0
mistral
pt
region:us
ru
safetensors
text-generation
text-generation-inference
transformers
zh

@ Copyright - Modular Inc - 2024