DeepSeek-llama3.1-Bllossom-8B enhances multilingual AI performance, notably improving Korean language inference.
Version: 8B-Q4_K_M CPU q4_k_m
You can quickly deploy DeepSeek-llama3.1-Bllossom-8B-Q4_K_M
to an endpoint using our MAX container.
It includes the latest version of MAX with GPU support and our Python-based inference server called MAX Serve.
With the following Docker command, you’ll get an OpenAI-compatible endpoint running DeepSeek-llama3.1-Bllossom-8B-Q4_K_M
:
docker run --gpus 1 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_HUB_ENABLE_HF_TRANSFER=1" \
--env "HF_TOKEN=" \
-p 8000:8000 \
docker.modular.com/modular/max-openai-api:nightly \
--huggingface-repo-id kenonix/DeepSeek-llama3.1-Bllossom-8B-Q4_K_M-GGUF \
--weight-path=kenonix/DeepSeek-llama3.1-Bllossom-8B-Q4_K_M-GGUF/deepseek-llama3.1-bllossom-8b-q4_k_m.gguf
In order to download the model from Hugging Face, you just need to fill in the
HF_TOKEN
value with your access token,
unless the model is from https://huggingface.co/modularai
.
For more information about the container image, see the MAX container documentation.
To learn more about how to deploy MAX to the cloud, check out our MAX Serve tutorials.
DETAILS
MAX Models are popular open-source models converted to MAX’s native graph format. Anything with the label is either SOTA or being worked on. Learn more about MAX Models.
Browse all MAX Models
MAX GITHUB
Modular / MAX
BASE MODEL
kenonix
kenonix/DeepSeek-llama3.1-Bllossom-8B-Q4_K_M-GGUF
QUANTIZED BY
kenonix
kenonix/DeepSeek-llama3.1-Bllossom-8B-Q4_K_M-GGUF
QUESTIONS ABOUT THIS MODEL?
Leave a comment
PROBLEMS WITH THE CODE?
File an Issue
TAGS
ENTERPRISES
@ Copyright - Modular Inc - 2025