Use AnythingLLM with MAX Serve to chat with Llama 3.1
Help us improve and tell us what you’d like us to build next.
Request a recipe topicREADME
Building on the solid foundation MAX provides, adding a robust user interface is a natural next step.
In this recipe you will:
AnythingLLM is a powerful platform offering a familiar chat interface for interacting with open-source AI models. Like MAX, AnythingLLM empowers users to maintain complete ownership of their AI infrastructure, avoiding vendor lock-in risks, and enhancing privacy. With over 30,000 stars on GitHub, AnythingLLM has become one of the most popular solutions for private AI deployment. The platform's versatility allows it to work perfectly with MAX to create a complete end-to-end AI solution.
Please make sure your system meets our minimum requirements.
To proceed, ensure you have the magic
CLI installed with the magic --version
to be 0.7.2 or newer:
curl -ssL https://magic.modular.com/ | bash
...or update magic
to the latest version:
magic self-update
Then install max-pipelines
via:
magic global install -u max-pipelines
A valid Hugging Face token ensures access to the model and weights.
We'll use Docker to run the AnythingLLM container. Follow the instructions in the Docker documentation if you need to install it.
Download the code for this recipe using the magic
CLI:
magic init max-serve-anythingllm --from modular/max-recipes/max-serve-anythingllm
cd max-serve-anythingllm
Next, include your Hugging Face token in a .env
file by running:
echo "HUGGING_FACE_HUB_TOKEN=your_token_here" >> .env
You can start MAX and AnythingLLM with one command:
magic run app
This command is defined in the pyproject.toml
file which we will cover later.
MAX Serve is ready once you see a line containing the following in the log output:
max.serve: Server ready on http://0.0.0.0:3002/
AnythingLLM is ready once you see a line like the following in the log output:
Primary server in HTTP mode listening on port 3001
Note: the port that AnythingLLM reports can differ from the port you configure in pyroject.toml
as the UI_PORT
.
Once both servers are ready, launch AnythingLLM in your browser at http://localhost:3001
When you run the command magic run app
, the Python script main.py
will orchestrate getting both MAX and AnythingLLM up and running. Notably, it will create a data
folder if one doesn't exist. AnythingLLM uses this folder as persistent storage for your settings, chat history, etc. The location of the folder is configurable within pyproject.toml
by changing the value of UI_STORAGE_LOCATION
.
The first time you launch AnythingLLM in your browser, you will see a welcome screen. Choose Get Started, then complete the following steps:
http://host.docker.internal:3002/v1
local
(MAX doesn't require an API key, but this field can't be blank)modularai/Llama-3.1-8B-Instruct-GGUF
16384
(Must match MAX_CONTEXT_LENTH
from pyproject.toml
)1024
Note: Don't let the modularai
in the Chat Model Name field limit you. MAX supports any PyTorch model on Hugging Face, and includes special acceleration for the most common architectures. Modular simply hosts weights for Llama 3.1 to get you up and running quickly. (Access to the official meta-llama repo is gated and requires waiting for approval.)
Let's explore how the key components of this recipe work together.
The recipe is configured in the pyproject.toml
file, which defines:
Environment variables to control the ports, storage locations, and additional settings:
MAX_SECRETS_LOCATION = ".env"
: Location of file containing your Hugging Face tokenMAX_CONTEXT_LENGTH = "16384"
: LLM context window sizeMAX_BATCH_SIZE = "1"
: LLM batch size (use 1 when running on CPU)MAX_SERVE_PORT = "3002"
: Port for MAX ServeUI_PORT = "3001"
: Port for AnythingLLMUI_STORAGE_LOCATION = "./data"
: Persistent storage for AnythingLLMUI_CONTAINER_NAME = "anythingllm-max"
: Name for referencing the container with DockerTasks you can run with the magic run
command:
app
: Runs the main Python script that coordinates both servicessetup
: Sets up persistent storage for AnythingLLMui
: Launches the AnythingLLM Docker containerllm
: Starts MAX Serve with Llama 3.1clean
: Cleans up network resources for both servicesDependencies for running both services:
max-pipelines
CLIThe setup.py
script handles the initial setup for AnythingLLM:
UI_STORAGE_LOCATION
from pyproject.toml
.env
file is present for AnythingLLM settingsmagic run app
When you run magic run app
, the main.py
script coordinates everything necessary to start and shutdown both services:
Command-line Interface
python main.py llm ui --pre setup --post clean
run_app()
magic run
tasks concurrentlyrun_task()
setup
or clean
pyproject.toml
atexit
to ensure cleanupCleanup Process
clean
) are automatically run when the application exitsclean
task terminates all processes, releases all ports, and removes the AnythingLLM container from DockerNow that you're up and running with AnythingLLM on MAX, you can explore more features and join our developer community. Here are some resources to help you continue your journey:
magic
CLI in our Magic tutorialDETAILS
THE CODE
max-serve-anythingllm
AUTHOR
Bill Welense
AVAILABLE TASKS
magic run app
magic run clean
PROBLEMS WITH THE CODE?
File an Issue
TAGS
Help us improve and tell us what you’d like us to build next.
Request a recipe topic