R1 0528
Open Sourcevon DeepSeek · Veroeffentlicht 2025-05-28
57.9
Durchschn. Score
$0.50/1M
Eingabepreis
$2.15/1M
Ausgabepreis
164K tokens (~82 books)
Kontextfenster
text
Typ
Tested on 25 benchmarks with 57.9% average. Top scores: Chatbot Arena Elo — Overall (1421.7%), MATH level 5 (96.6%), OpenCompass — AIME2025 (89.0%).
Benchmark-Ergebnisse
| Benchmark | Kategorie | Score | Bar |
|---|---|---|---|
| Chatbot Arena Elo — Overall | arena | 1421.7 | |
| MATH level 5 | math | 96.6 | |
| OpenCompass — AIME2025 | math | 89.0 | |
| OpenCompass — MMLU-Pro | knowledge | 83.5 | |
| HELM — WildBench | reasoning | 82.8 | |
| OpenCompass — GPQA-Diamond | knowledge | 80.6 | |
| OpenCompass — IFEval | language | 80.0 | |
| HELM — MMLU-Pro | knowledge | 79.3 | |
| HELM — IFEval | language | 78.4 | |
| Aider polyglot | coding | 71.4 | |
| GPQA diamond | knowledge | 68.4 | |
| HELM — GPQA | knowledge | 66.6 | |
| OTIS Mock AIME 2024-2025 | math | 66.4 | |
| OpenCompass — LiveCodeBenchV6 | coding | 61.0 | |
| HELM — Omni-MATH | math | 42.4 | |
| WeirdML | coding | 41.6 | |
| DeepResearch Bench | knowledge | 35.1 | |
| SimpleBench | reasoning | 29.0 | |
| SimpleQA Verified | knowledge | 27.4 | |
| Artificial Analysis — Quality Index | speed | 27.1 | |
| Artificial Analysis — Coding Index | speed | 24.0 | |
| ARC-AGI | reasoning | 21.2 | |
| Artificial Analysis — Agentic Index | speed | 20.8 | |
| OpenCompass — HLE | knowledge | 14.4 | |
| ARC-AGI-2 | reasoning | 1.1 |