DeepSeek-R1 (May 2025)
por DeepSeek · Lançado em 2024-01-01
48.5
pontuação média
N/A
Preço de entrada
N/A
Preço de saída
N/A
Janela de contexto
text
Tipo
Tested on 11 benchmarks with 48.5% average. Top scores: MATH level 5 (96.6%), Fiction.LiveBench (75.0%), Aider polyglot (71.4%).
Pontuações de benchmark
| Benchmark | Categoria | Pontuação | Bar |
|---|---|---|---|
| MATH level 5 | math | 96.6 | |
| Fiction.LiveBench | knowledge | 75.0 | |
| Aider polyglot | coding | 71.4 | |
| GPQA diamond | knowledge | 68.4 | |
| OTIS Mock AIME 2024-2025 | math | 66.4 | |
| WeirdML | coding | 41.6 | |
| DeepResearch Bench | knowledge | 35.1 | |
| SimpleBench | reasoning | 29.0 | |
| SimpleQA Verified | knowledge | 27.4 | |
| ARC-AGI | reasoning | 21.2 | |
| ARC-AGI-2 | reasoning | 1.1 |
Modelos similares
U
Baichuan2-13Bunknown
48.4
Alibaba Qwen
48.0
xAI
47.8
Meta
49.3