FrontierMath-2025-02-28-Private
FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.
The Frontier
Best score over time · one chart, every benchmark
Full rankings
54 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 50.0 | |
| 2 | 47.6 | |
| 3 | 40.7 | |
| 4 | 40.7 | |
| 5 | U Muse Spark | 39.0 |
| 6 | 37.6 | |
| 7 | 36.9 | |
| 8 | 35.6 | |
| 9 | 32.4 | |
| 10 | 32.4 | |
| 11 | 31.0 | |
| 12 | 27.9 | |
| 13 | 27.2 | |
| 14 | 24.8 | |
| 15 | 22.1 | |
| 16 | 21.4 | |
| 17 | 20.7 | |
| 18 | 19.7 | |
| 19 | 18.7 | |
| 20 | 16.4 | |
| 21 | 15.2 | |
| 22 | 14.1 | |
| 23 | 12.4 | |
| 24 | 9.3 | |
| 25 | 8.5 | |
| 26 | 8.3 | |
| 27 | 7.2 | |
| 28 | 5.9 | |
| 29 | 5.9 | |
| 30 | 5.5 | |
| 31 | 4.8 | |
| 32 | 4.5 | |
| 33 | 4.5 | |
| 34 | 4.1 | |
| 35 | 4.1 | |
| 36 | 3.8 | |
| 37 | 3.8 | |
| 38 | 2.4 | |
| 39 | 1.7 | |
| 40 | 1.7 | |
| 41 | 1.7 | |
| 42 | 1.0 | |
| 43 | 1.0 | |
| 44 | 1.0 | |
| 45 | 0.7 | |
| 46 | 0.7 | |
| 47 | 0.3 | |
| 48 | 0.3 | |
| 49 | 0.3 | |
| 50 | 0.3 | |
| 51 | 0.3 | |
| 52 | 0.3 | |
| 53 | 0.1 | |
| 54 | 0.1 |
Score distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with FrontierMath-2025-02-28-Private
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Frequently asked
About FrontierMath-2025-02-28-Private
What does FrontierMath-2025-02-28-Private measure?
FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning. 54 AI models have been tested on it. Scores range from 0.1 to 50.0 out of 100.
Which model leads on FrontierMath-2025-02-28-Private?
GPT-5.4 Pro from OpenAI leads FrontierMath-2025-02-28-Private with a score of 50.0. The median score across 54 tested models is 6.6.
Is FrontierMath-2025-02-28-Private saturated?
No · the top score is 50.0 out of 100 (50%). There is still meaningful room for improvement on FrontierMath-2025-02-28-Private.
Does FrontierMath-2025-02-28-Private predict performance on other benchmarks?
Yes · FrontierMath-2025-02-28-Private scores correlate 0.94 with Artificial Analysis · Quality Index across 12 shared models. Models that do well on FrontierMath-2025-02-28-Private tend to do well on Artificial Analysis · Quality Index.
How often is FrontierMath-2025-02-28-Private data refreshed?
BenchGecko pulls updates daily. New model scores on FrontierMath-2025-02-28-Private appear as soon as they are published by Epoch AI or the model provider.
- Category
- Math
- Max score
- 100
- Models
- 54
- Updated
- 2026-03-05
Top on FrontierMath-2025-02-28-Private
GPT-5.4 Pro · 50.0GPT-5.4 · 47.6Claude Opus 4.6 · 40.7GPT-5.2 · 40.7Muse Spark · 39.0More math benchmarks
Same category · related evaluations