DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

30 January, 2025

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Introducing DeepSeekMath 7B, a model that tackles the complex challenge of mathematical reasoning in language models. Trained on 120B math-related tokens, DeepSeekMath 7B achieves 51.7% on the MATH benchmark, nearing the performance of Gemini-Ultra and GPT-4. Notably, it also reaches 60.9% through self-consistency.

Key innovations include:

  1. Leveraging publicly available web data through a robust data selection pipeline.
  2. Introducing Group Relative Policy Optimization (GRPO), a variant of PPO, to enhance reasoning and optimize memory use.

Read the full paper on arXiv: https://arxiv.org/html/2402.03300v3

Related Articles