DeepSeek-R1
Introduction
The DeepSeek-R1 series introduces innovative reasoning models: DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero utilizes large-scale reinforcement learning (RL) without initial supervised fine-tuning (SFT), resulting in impressive reasoning capabilities but with challenges in readability and language consistency. To tackle these issues and enhance performance, DeepSeek-R1 incorporates initial data for RL. DeepSeek-R1 competes with leading models like OpenAI-o1 across math, code, and reasoning tasks and offers open-source availability. Distilled versions based on Llama and Qwen set new benchmarks for dense models.
For local use, check the Usage Recommendation.
Model Summary
Post-Training: Advanced Reinforcement Learning
By applying RL directly to models, without SFT, DeepSeek-R1-Zero excels in problem-solving and self-reflection, highlighting its potential in the research community. This revolutionary approach proves that reasoning capabilities in large language models (LLMs) can be developed purely through RL, suggesting future advancements.
Distillation: Power in Smaller Models
Our methods demonstrate that larger model reasoning can be distilled into smaller, high-performing models. The open-source nature of DeepSeek-R1 supports ongoing research and model distillation, showcasing remarkable benchmark performance in dense models.
Model Downloads
DeepSeek-R1 Models
Model |
Total Params |
Activated Params |
Context Length |
Download |
DeepSeek-R1-Zero |
671B |
37B |
128K |
Download Link |
DeepSeek-R1 |
671B |
37B |
128K |
Download Link |
DeepSeek-R1 models were developed from DeepSeek-V3-Base. Additional architecture details are available in the DeepSeek-V3 repository.
DeepSeek-R1-Distill Models
Model |
Base Model |
Download |
DeepSeek-R1-Distill-Qwen-32B |
Qwen2.5-32B |
Download Link |
These distilled models were fine-tuned using DeepSeek-R1-generated data, calling for specific config settings for optimal use.
Evaluation Results
DeepSeek-R1 Evaluation
DeepSeek-R1 is assessed on a maximum output length of 32,768 tokens, with temperature set to 0.6 and top-p value to 0.95 for sampling tasks. Here's a snapshot of English benchmarks evaluated for model performance:
Category |
Benchmark (Metric) |
DeepSeek R1 |
English |
MMLU (Pass@1) |
90.8 |
Distilled Model Evaluation
Distilled models, such as DeepSeek-R1-Distill-Qwen-32B, demonstrate high performance across several metrics, affirming their worth for diverse tasks.
Chat Website & API Platform
Interact with DeepSeek-R1 via DeepSeek's official chat site or explore the OpenAI-Compatible API at DeepSeek's platform.
How to Run Locally
For localized implementation and further guidance on setting up DeepSeek-R1, visit the DeepSeek-V3 repository.
Usage Recommendations
To achieve optimal performance with the DeepSeek-R1 models, consider the following:
- Set a temperature of 0.5-0.7 for balanced outputs.
- Avoid system prompts; instructions should reside in the user prompt.
- Encourage detailed responses by prompting with "Please reason step by step."
License
Licensed under the MIT License, DeepSeek-R1 series permit commercial use and modifications, including model distillation. Distilled models adhere to Apache 2.0 and Llama3 licenses depending on the base.
Citation
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
author={DeepSeek-AI and others},
year={2025},
eprint={2501.12948},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.12948},
}
Contact
For queries, feel free to reach out via service@deepseek.com.