ARC-AGI
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
The Frontier
Best score over time · one chart, every benchmark
Full rankings
49 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 98.0 | |
| 2 | 95.0 | |
| 3 | 94.5 | |
| 4 | 94.0 | |
| 5 | 93.7 | |
| 6 | 90.5 | |
| 7 | 86.5 | |
| 8 | 86.2 | |
| 9 | 80.0 | |
| 10 | 75.0 | |
| 11 | 72.8 | |
| 12 | 70.2 | |
| 13 | 66.7 | |
| 14 | 65.7 | |
| 15 | 65.3 | |
| 16 | 63.7 | |
| 17 | 63.7 | |
| 18 | 60.8 | |
| 19 | 59.3 | |
| 20 | 58.7 | |
| 21 | 57.0 | |
| 22 | 54.3 | |
| 23 | 48.5 | |
| 24 | 47.7 | |
| 25 | 44.7 | |
| 26 | 41.0 | |
| 27 | 40.0 | |
| 28 | 35.7 | |
| 29 | 34.5 | |
| 30 | 32.3 | |
| 31 | 30.7 | |
| 32 | 28.6 | |
| 33 | 21.5 | |
| 34 | 21.2 | |
| 35 | 20.7 | |
| 36 | 18.0 | |
| 37 | 16.5 | |
| 38 | 15.8 | |
| 39 | 14.0 | |
| 40 | 11.0 | |
| 41 | 10.3 | |
| 42 | 5.5 | |
| 43 | 5.5 | |
| 44 | U Magistral Small 1.1 | 5.0 |
| 45 | 4.5 | |
| 46 | 4.4 | |
| 47 | 3.5 | |
| 48 | 0.5 | |
| 49 | 0.1 |
Score distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with ARC-AGI
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Frequently asked
About ARC-AGI
What does ARC-AGI measure?
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization. 49 AI models have been tested on it. Scores range from 0.1 to 98.0 out of 100.
Which model leads on ARC-AGI?
Gemini 3.1 Pro Preview from Google DeepMind leads ARC-AGI with a score of 98.0. The median score across 49 tested models is 44.7.
Is ARC-AGI saturated?
Yes · the top model on ARC-AGI has reached 98.0 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.
Does ARC-AGI predict performance on other benchmarks?
Yes · ARC-AGI scores correlate 0.94 with Cybench across 13 shared models. Models that do well on ARC-AGI tend to do well on Cybench.
How often is ARC-AGI data refreshed?
BenchGecko pulls updates daily. New model scores on ARC-AGI appear as soon as they are published by Epoch AI or the model provider.
- Category
- Reasoning
- Max score
- 100
- Models
- 49
- Updated
- 2026-04-23
Top on ARC-AGI
Gemini 3.1 Pro Preview · 98.0GPT-5.5 · 95.0GPT-5.4 Pro · 94.5Claude Opus 4.6 · 94.0GPT-5.4 · 93.7More reasoning benchmarks
Same category · related evaluations