OpenCompass · AIME2025
The Frontier
Best score over time · one chart, every benchmark
Distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with OpenCompass · AIME2025
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Full rankings
32 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 96.0 | |
| 2 | 95.8 | |
| 3 | 95.7 | |
| 4 | 95.4 | |
| 5 | 94.1 | |
| 6 | 93.4 | |
| 7 | 93.0 | |
| 8 | 92.9 | |
| 9 | 92.3 | |
| 10 | 91.9 | |
| 11 | 90.9 | |
| 12 | 90.3 | |
| 13 | 89.0 | |
| 14 | 89.0 | |
| 15 | 88.7 | |
| 16 | 87.9 | |
| 17 | 86.8 | |
| 18 | 86.2 | |
| 19 | 85.8 | |
| 20 | 80.0 | |
| 21 | 79.1 | |
| 22 | 76.2 | |
| 23 | 70.3 | |
| 24 | 69.5 | |
| 25 | 69.2 | |
| 26 | 68.7 | |
| 27 | 66.2 | |
| 28 | 65.7 | |
| 29 | 63.8 | |
| 30 | 61.0 | |
| 31 | 46.9 | |
| 32 | 22.4 |
Frequently asked
Pulled from the OpenCompass · AIME2025 dataset · updated daily
What does OpenCompass · AIME2025 measure?
OpenCompass · AIME2025 is a knowledge benchmark in the BenchGecko catalog. 32 AI models have been tested on it. Scores range from 22.4 to 96.0 out of 100.
Which model leads on OpenCompass · AIME2025?
DeepSeek V3.2 Speciale from DeepSeek leads OpenCompass · AIME2025 with a score of 96.0. The median score across 32 tested models is 87.3.
Is OpenCompass · AIME2025 saturated?
Yes · the top model on OpenCompass · AIME2025 has reached 96.0 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.
Does OpenCompass · AIME2025 predict performance on other benchmarks?
Yes · OpenCompass · AIME2025 scores correlate 0.94 with GPQA diamond across 10 shared models. Models that do well on OpenCompass · AIME2025 tend to do well on GPQA diamond.
How often is OpenCompass · AIME2025 data refreshed?
BenchGecko pulls updates daily. New model scores on OpenCompass · AIME2025 appear as soon as they are published by Epoch AI or the model provider.
Top on OpenCompass · AIME2025
DeepSeek V3.2 Speciale · 96.0GLM 5 · 95.8Step 3.5 Flash · 95.7GLM 4.7 · 95.4Kimi K2 Thinking · 94.1More knowledge benchmarks
Same category · related evaluations