Learn How to Generate Embeddings with MAX Serve
Help us improve and tell us what you’d like us to build next.
Request a recipe topicREADME
Embeddings are a crucial component of intelligent agents, enabling efficient search and retrieval of proprietary information. MAX supports creation of embeddings using an OpenAI-compatible API, including the ability to run the popular sentence-transformers/all-mpnet-base-v2
model from Hugging Face. When you run MPNet on MAX, you'll be serving a high-performance implementation of the model built by Modular engineers with the MAX Graph API.
In this recipe you will:
MPNet works by encoding not only tokens (words and parts of words) but also positional data about where those tokens appear in a sentence. Upon its publication in 2020, MPNet met or exceeded the capability of popular predecessors, BERT and XLNet. Today, it is one of the most popular open-source models for generating embeddings.
Please make sure your system meets our system requirements.
To proceed, ensure you have the magic
CLI installed with the magic --version
to be 0.7.2 or newer:
curl -ssL https://magic.modular.com/ | bash
or update it via:
magic self-update
Then install max-pipelines
via:
magic global install max-pipelines=="25.2.0.dev2025031705"
For this recipe, you will need a valid Hugging Face token to access the model.
Once you have obtained the token, include it in .env
by:
cp .env.example .env
then add your token in .env
HUGGING_FACE_HUB_TOKEN=
Download the code for this recipe using the magic
CLI:
magic init max-serve-openai-embeddings --from modular/max-recipes/max-serve-openai-embeddings
cd max-serve-openai-embeddings
Run the embedding application
Make sure the port 8000
is available. You can adjust the port settings in Procfile.
magic run server
This command is defined in the pyproject.toml
file and invokes the max-pipelines
CLI.
MAX Serve is ready once you see a line containing the following output:
Server running on http://0.0.0.0:8000/
Run the embedding code in main.py
with magic
:
magic run main
And you should see output like this:
=== Generated embeddings with OpenAI client ===
Successfully generated embeddings!
Number of embeddings: 5
Embedding dimension: 768
1st few values of 1st embedding: [0.36384445428848267, -0.7647817730903625, ...]
And once done with the app, to clean up the resources run:
magic run clean
The code for this recipe is intentionally simple — we're excited for you to start building your own project on MAX.
Open up main.py
in your code editor. At the top of the file, you'll see the following:
from openai import OpenAI
MODEL_NAME = "sentence-transformers/all-mpnet-base-v2" #1
BASE_URL="http://localhost:8000/v1"
API_KEY="local"
client = OpenAI(base_url=BASE_URL, api_key=API_KEY) #2
def main():
"""Test embeddings using OpenAI client"""
sentences = [ #3
"Rice is often served in round bowls.",
"The juice of lemons makes fine punch.",
"The bright sun shines on the old garden.",
"The soft breeze came across the meadow.",
"The small pup gnawed a hole in the sock."
]
try:
response = client.embeddings.create( #4
model=MODEL_NAME,
input=sentences
)
print("\n=== Generated embeddings with OpenAI client ===")
print("Successfully generated embeddings!")
print(f"Number of embeddings: {len(response.data)}") #5
print(f"Embedding dimension: {len(response.data[0].embedding)}")
print(f"First embedding, first few values: {response.data[0].embedding[:5]}")
except Exception as e:
print(f"Error using client: {str(e)}")
if __name__ == "__main__":
main()
Here's what the code does:
API_KEY
; MAX Serve does not use one, but the OpenAI client requires this value not be blank.)Note how the code is a drop-in replacement for the proprietary OpenAI API---this is a key advantage of building with MAX!
Now that you've created embeddings with MAX Serve, you can explore more features and join our developer community. Here are some resources to help you continue your journey:
magic
CLI in this Magic tutorialDETAILS
THE CODE
max-serve-openai-embeddings
AUTHOR
Bill Welense
AVAILABLE TASKS
magic run server
magic run main
PROBLEMS WITH THE CODE?
File an Issue
TAGS
Help us improve and tell us what you’d like us to build next.
Request a recipe topicENTERPRISES
@ Copyright - Modular Inc - 2025