Phi-4-15B Model | MAX Builds

Version:

15B GPU: BF16

PyTorch

This version is not quantized and a GPU is recommended.

Install our magic package manager:
```
curl -ssL https://magic.modular.com/ | bash
```
Then run the source command that's printed in your terminal.

Install Max Pipelines in order to run this model.

magic global install max-pipelines && magic global update

Start a local endpoint for phi-4/15B:

max-pipelines serve --huggingface-repo-id=unsloth/phi-4-GGUF

The endpoint is ready when you see the URI printed in your terminal:

Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)

Now open another terminal to send a request using curl:

curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "unsloth/phi-4-GGUF",
    "stream": true,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
}' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
' | sed 's/\n//g'

🎉 Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

Deploy this model to cloud

Phi-4

Model Summary

Phi-4, developed by Microsoft Research, is a cutting-edge 14-billion-parameter transformer model designed for text generation, aimed at advancing research in language models. It is built using a combination of synthetic datasets and public domain resources, focusing on superior reasoning and instruction adherence. The model supports a chat-like input format with a 16,000-token context length, and its training employed 1,920 H100-80G GPUs over 21 days, processing 9.8 trillion tokens. The model is licensed under MIT and was released on December 12, 2024.

Intended Use

Phi-4 is intended for use as an AI research tool and generative AI feature component, particularly beneficial in memory-constrained and latency-bound settings and scenarios requiring advanced reasoning. It is not specifically tailored for all downstream applications, and developers should address limitations like accuracy, safety, and fairness before deployment, especially in high-risk use cases. Adherence to relevant laws and regulations is necessary.

Data Overview

Training Data

The training data for Phi-4 extends from its predecessor, covering:

Filtered high-quality public documents and educational data.
Synthetic data for teaching topics like math and coding.
Academic books and Q&A datasets.
High-quality chat data for demonstrating human preferences.

Multilingual data comprises about 8% of the dataset, emphasizing data quality to enhance reasoning abilities.

Benchmark Datasets

Phi-4 was evaluated using OpenAI’s SimpleEval and internal benchmarks, including:

MMLU for language understanding
MATH for competition math problems
GPQA for science queries
DROP for comprehension and reasoning
MGSM for multi-lingual math
HumanEval for code generation
SimpleQA for factual responses.

Safety

Phi-4 incorporates advanced safety post-training, combining Supervised Fine-Tuning and Direct Preference Optimization with various datasets. Its safety was evaluated through open-source benchmarks and adversarial simulations, ensuring preparedness against potential threats like jailbreaks and encoding-based attacks.

Model Quality

Phi-4's performance across various benchmarks showcases its high-level capabilities, with notable achievements in science and mathematics assessments. Comparison with other models highlights its strong competitive position.

Responsible AI Considerations

Developers should be aware of Phi-4’s ability to produce biased or inaccurate content, especially concerning non-English languages and cultural stereotypes. The model isn't recommended for sensitive contexts without additional safeguards, and developers should follow responsible AI practices while conforming to laws and regulations. Safety measures such as Azure AI Content Safety are advisable to enhance user protection and compliance standards.

Citations

Phi-4 Technical Report

Metadata

Version: 15B GPU BF16

This code works on compatible Linux machines.
We are actively working on enabling MAX Serve for MacOS ARM64 as well.

You can quickly deploy phi-4-15B to an endpoint using our MAX container. It includes the latest version of MAX with GPU support and our Python-based inference server called MAX Serve.

With the following Docker command, you’ll get an OpenAI-compatible endpoint running phi-4-15B:

docker run --gpus 1 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_HUB_ENABLE_HF_TRANSFER=1" \
    --env "HF_TOKEN=" \
    -p 8000:8000 \
    docker.modular.com/modular/max-openai-api:nightly \
    --huggingface-repo-id unsloth/phi-4-GGUF

In order to download the model from Hugging Face, you just need to fill in the HF_TOKEN value with your access token, unless the model is from https://huggingface.co/modularai.

Learn more

For more information about the container image, see the MAX container documentation.

To learn more about how to deploy MAX to the cloud, check out our MAX Serve tutorials.

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

DETAILS

MODEL CLASS

PyTorch

HARDWARE

GPU

QUANTIZATION

BF16

ARCHITECTURE

PyTorch

MAX GITHUB

Modular / MAX

MODEL

unsloth

unsloth/phi-4-GGUF

QUESTIONS ABOUT THIS MODEL?

Resources & support for
running phi-4-15B

Browse 27+ Tutorials

View Tutorials

Get help using MAX

Modular Forum

Read Documentation

Go to Docs

phi-4-15B

Phi-4

Model Summary

Intended Use

Data Overview

Training Data

Benchmark Datasets

Safety

Model Quality

Responsible AI Considerations

Citations

Metadata

Learn more

Resources & support for running phi-4-15B

Resources & support for
running phi-4-15B