Beta
Benchmark · Knowledge

OpenCompass · AIME2025

Updated 2026-02-16
Models tested
32
Top score
96.0
DeepSeek V3.2 Speciale
Median
87.3
min 22.4
Top-5 spread
σ 0.7
settled

Best score over time · one chart, every benchmark

OPENCOMPASS · AIME202532 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Mar 25Jun 25Aug 25Nov 25Feb 26RELEASE DATE →benchgecko.ai/benchmark/oc-aime2025 · frontier
Frontier on OpenCompass · AIME2025 rose from 22.4 to 96.0 in 9 months · +73.6 points · latest leader DeepSeek V3.2 Speciale from DeepSeek.
Pink dots = frontier records · 7 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION0–1010–20120–3030–40140–5050–60760–70370–80880–901290–100MEDIAN · 87.3SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

32 models tested · sorted by score

Pulled from the OpenCompass · AIME2025 dataset · updated daily

What does OpenCompass · AIME2025 measure?

OpenCompass · AIME2025 is a knowledge benchmark in the BenchGecko catalog. 32 AI models have been tested on it. Scores range from 22.4 to 96.0 out of 100.

Which model leads on OpenCompass · AIME2025?

DeepSeek V3.2 Speciale from DeepSeek leads OpenCompass · AIME2025 with a score of 96.0. The median score across 32 tested models is 87.3.

Is OpenCompass · AIME2025 saturated?

Yes · the top model on OpenCompass · AIME2025 has reached 96.0 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does OpenCompass · AIME2025 predict performance on other benchmarks?

Yes · OpenCompass · AIME2025 scores correlate 0.94 with GPQA diamond across 10 shared models. Models that do well on OpenCompass · AIME2025 tend to do well on GPQA diamond.

How often is OpenCompass · AIME2025 data refreshed?

BenchGecko pulls updates daily. New model scores on OpenCompass · AIME2025 appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations