models

/

OLMo-0424-7B

Version:

7B GPU:

PyTorch

This version is not quantized and a GPU is recommended.

  1. Install our magic package manager:

    curl -ssL https://magic.modular.com/ | bash

    Then run the source command that's printed in your terminal.

  2. Install Max Pipelines in order to run this model.

    magic global install max-pipelines && magic global update
  3. Start a local endpoint for OLMo-0424/7B:

    max-pipelines serve --huggingface-repo-id=allenai/OLMo-7B-0424

    The endpoint is ready when you see the URI printed in your terminal:

    Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
  4. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
        "model": "allenai/OLMo-7B-0424",
        "stream": true,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the World Series in 2020?"}
        ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
    ' | sed 's/\n//g'
  5. 🎉 Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

Deploy this model to cloud

DETAILS

ChatMODEL CLASS
PyTorch
HARDWARE
GPU
QUANTIZATION
ARCHITECTURE
PyTorch

MAX GITHUB

Modular / MAX

MODEL

allenai

allenai/OLMo-7B-0424

QUESTIONS ABOUT THIS MODEL?

Leave a comment

PROBLEMS WITH THE CODE?

File an Issue

TAGS

transformers

/

pytorch

/

olmo

/

text-generation

/

en

/

dataset:allenai/dolma

/

arxiv:2402.00838

/

license:apache-2.0

/

autotrain_compatible

/

endpoints_compatible

/

region:us

Resources & support for
running OLMo-0424-7B

@ Copyright - Modular Inc - 2025