Phi-4
Model Summary
Phi-4, developed by Microsoft Research, is a cutting-edge 14-billion-parameter transformer model designed for text generation, aimed at advancing research in language models. It is built using a combination of synthetic datasets and public domain resources, focusing on superior reasoning and instruction adherence. The model supports a chat-like input format with a 16,000-token context length, and its training employed 1,920 H100-80G GPUs over 21 days, processing 9.8 trillion tokens. The model is licensed under MIT and was released on December 12, 2024.
Intended Use
Phi-4 is intended for use as an AI research tool and generative AI feature component, particularly beneficial in memory-constrained and latency-bound settings and scenarios requiring advanced reasoning. It is not specifically tailored for all downstream applications, and developers should address limitations like accuracy, safety, and fairness before deployment, especially in high-risk use cases. Adherence to relevant laws and regulations is necessary.
Data Overview
Training Data
The training data for Phi-4 extends from its predecessor, covering:
- Filtered high-quality public documents and educational data.
- Synthetic data for teaching topics like math and coding.
- Academic books and Q&A datasets.
- High-quality chat data for demonstrating human preferences.
Multilingual data comprises about 8% of the dataset, emphasizing data quality to enhance reasoning abilities.
Benchmark Datasets
Phi-4 was evaluated using OpenAI’s SimpleEval and internal benchmarks, including:
- MMLU for language understanding
- MATH for competition math problems
- GPQA for science queries
- DROP for comprehension and reasoning
- MGSM for multi-lingual math
- HumanEval for code generation
- SimpleQA for factual responses.
Safety
Phi-4 incorporates advanced safety post-training, combining Supervised Fine-Tuning and Direct Preference Optimization with various datasets. Its safety was evaluated through open-source benchmarks and adversarial simulations, ensuring preparedness against potential threats like jailbreaks and encoding-based attacks.
Model Quality
Phi-4's performance across various benchmarks showcases its high-level capabilities, with notable achievements in science and mathematics assessments. Comparison with other models highlights its strong competitive position.
Responsible AI Considerations
Developers should be aware of Phi-4’s ability to produce biased or inaccurate content, especially concerning non-English languages and cultural stereotypes. The model isn't recommended for sensitive contexts without additional safeguards, and developers should follow responsible AI practices while conforming to laws and regulations. Safety measures such as Azure AI Content Safety are advisable to enhance user protection and compliance standards.
Citations