DeepSeek-R1

Introduction

The DeepSeek-R1 series introduces innovative reasoning models: DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero utilizes large-scale reinforcement learning (RL) without initial supervised fine-tuning (SFT), resulting in impressive reasoning capabilities but with challenges in readability and language consistency. To tackle these issues and enhance performance, DeepSeek-R1 incorporates initial data for RL. DeepSeek-R1 competes with leading models like OpenAI-o1 across math, code, and reasoning tasks and offers open-source availability. Distilled versions based on Llama and Qwen set new benchmarks for dense models.

For local use, check the Usage Recommendation.

Model Summary

Post-Training: Advanced Reinforcement Learning

By applying RL directly to models, without SFT, DeepSeek-R1-Zero excels in problem-solving and self-reflection, highlighting its potential in the research community. This revolutionary approach proves that reasoning capabilities in large language models (LLMs) can be developed purely through RL, suggesting future advancements.

Distillation: Power in Smaller Models

Our methods demonstrate that larger model reasoning can be distilled into smaller, high-performing models. The open-source nature of DeepSeek-R1 supports ongoing research and model distillation, showcasing remarkable benchmark performance in dense models.

Model Downloads

DeepSeek-R1 Models

Model	Total Params	Activated Params	Context Length	Download
DeepSeek-R1-Zero	671B	37B	128K	Download Link
DeepSeek-R1	671B	37B	128K	Download Link

DeepSeek-R1 models were developed from DeepSeek-V3-Base. Additional architecture details are available in the DeepSeek-V3 repository.

DeepSeek-R1-Distill Models

Model	Base Model	Download
DeepSeek-R1-Distill-Qwen-32B	Qwen2.5-32B	Download Link

These distilled models were fine-tuned using DeepSeek-R1-generated data, calling for specific config settings for optimal use.

Evaluation Results

DeepSeek-R1 Evaluation

DeepSeek-R1 is assessed on a maximum output length of 32,768 tokens, with temperature set to 0.6 and top-p value to 0.95 for sampling tasks. Here's a snapshot of English benchmarks evaluated for model performance:

Category	Benchmark (Metric)	DeepSeek R1
English	MMLU (Pass@1)	90.8

Distilled Model Evaluation

Distilled models, such as DeepSeek-R1-Distill-Qwen-32B, demonstrate high performance across several metrics, affirming their worth for diverse tasks.

Chat Website & API Platform

Interact with DeepSeek-R1 via DeepSeek's official chat site or explore the OpenAI-Compatible API at DeepSeek's platform.

How to Run Locally

For localized implementation and further guidance on setting up DeepSeek-R1, visit the DeepSeek-V3 repository.

Usage Recommendations

To achieve optimal performance with the DeepSeek-R1 models, consider the following:

Set a temperature of 0.5-0.7 for balanced outputs.
Avoid system prompts; instructions should reside in the user prompt.
Encourage detailed responses by prompting with "Please reason step by step."

License

Licensed under the MIT License, DeepSeek-R1 series permit commercial use and modifications, including model distillation. Distilled models adhere to Apache 2.0 and Llama3 licenses depending on the base.

Citation

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
      title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
      author={DeepSeek-AI and others},
      year={2025},
      eprint={2501.12948},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.12948},
}

Contact

For queries, feel free to reach out via service@deepseek.com.