Classificação/DeepSeek-R1 (May 2025)

DeepSeek-R1 (May 2025)

por DeepSeek · Lançado em 2024-01-01

48.5

pontuação média

N/A

Preço de entrada

N/A

Preço de saída

N/A

Janela de contexto

text

Tipo

Tested on 11 benchmarks with 48.5% average. Top scores: MATH level 5 (96.6%), Fiction.LiveBench (75.0%), Aider polyglot (71.4%).

Pontuações de benchmark

Benchmark	Categoria	Pontuação
MATH level 5	math	96.6
Fiction.LiveBench	knowledge	75.0
Aider polyglot	coding	71.4
GPQA diamond	knowledge	68.4
OTIS Mock AIME 2024-2025	math	66.4
WeirdML	coding	41.6
DeepResearch Bench	knowledge	35.1
SimpleBench	reasoning	29.0
SimpleQA Verified	knowledge	27.4
ARC-AGI	reasoning	21.2
ARC-AGI-2	reasoning	1.1