mistral-small-22b

MAX Model

1 versions

Mistral Small 3 sets a new benchmark in the “small” Large Language Models category below 70B.

Run this model

  1. Install our magic package manager:

    curl -ssL https://magic.modular.com/ | bash

    Then run the source command that's printed in your terminal.

  2. Install Max Pipelines in order to run this model.

    magic global install max-pipelines
  3. Start a local endpoint for mistral-small/22b:

    max-serve serve --huggingface-repo-id mistralai/Mistral-Small-Instruct-2409

    The endpoint is ready when you see the URI printed in your terminal:

    Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
  4. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
        "model": "mistral-small/22b",
        "stream": true,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the World Series in 2020?"}
        ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
    ' | sed 's/\n/
    /g'
  5. 🎉 Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

About

Mistral Small v24.09 is an advanced 22B parameter language model designed for improved human alignment, reasoning, and code generation. It strikes a balance between efficiency and performance, positioned as a cost-effective option between Mistral NeMo 12B and Mistral Large 2.

Key Features

  • Cost-efficient: Provides a solution that balances performance and budget for diverse applications.
  • Versatile: Excels in tasks like translation, summarization, and sentiment analysis.
  • Flexible deployment: Compatible with various platforms and environments for easy integration.
  • Performance upgrade: Offers significant improvements over the Mistral Small v24.02 model.
  • 128k sequence length: Supports extended context for enhanced task performance.
  • Balanced solution: Fast, reliable, and suited for scenarios where full-scale general-purpose models are unnecessary.

This model serves as a robust midpoint for users seeking high-quality outputs without the complexity or resource demands of larger models.


References

Blog post

Hugging Face

DETAILS

MODEL CLASS
MAX Model

MAX Models are extremely optimized inference pipelines to run SOTA performance for that model on both CPU and GPU. For many of these models, they are the fastest version of this model in the world.

Browse 18+ MAX Models

MODULAR GITHUB

Modular

CREATED BY

mistralai

MODEL

mistralai/Mistral-Small-Instruct-2409

TAGS

chat
de
en
es
fr
it
ja
ko
license:other
mistral
pt
region:us
ru
safetensors
vllm
zh

@ Copyright - Modular Inc - 2024