ARC-AGI
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
The Frontier
Best score over time · one chart, every benchmark
Classifica completa
48 modelli testati · ordinati per punteggio
| # | Modello | Punteggio |
|---|---|---|
| 1 | 98.0 | |
| 2 | 94.5 | |
| 3 | 94.0 | |
| 4 | 93.7 | |
| 5 | 90.5 | |
| 6 | 86.5 | |
| 7 | 86.2 | |
| 8 | 80.0 | |
| 9 | 75.0 | |
| 10 | 72.8 | |
| 11 | 70.2 | |
| 12 | 66.7 | |
| 13 | 65.7 | |
| 14 | 65.3 | |
| 15 | 63.7 | |
| 16 | 63.7 | |
| 17 | 60.8 | |
| 18 | 59.3 | |
| 19 | 58.7 | |
| 20 | 57.0 | |
| 21 | 54.3 | |
| 22 | 48.5 | |
| 23 | 47.7 | |
| 24 | 44.7 | |
| 25 | 41.0 | |
| 26 | 40.0 | |
| 27 | 35.7 | |
| 28 | 34.5 | |
| 29 | 32.3 | |
| 30 | 30.7 | |
| 31 | 28.6 | |
| 32 | 21.5 | |
| 33 | 21.2 | |
| 34 | 20.7 | |
| 35 | 18.0 | |
| 36 | 16.5 | |
| 37 | 15.8 | |
| 38 | 14.0 | |
| 39 | 11.0 | |
| 40 | 10.3 | |
| 41 | 5.5 | |
| 42 | 5.5 | |
| 43 | U Magistral Small 1.1 | 5.0 |
| 44 | 4.5 | |
| 45 | 4.4 | |
| 46 | 3.5 | |
| 47 | 0.5 | |
| 48 | 0.1 |
Distribuzione dei punteggi
Dove si concentrano i modelli
Benchmark correlati
Pearson r · ricerca originale
Benchmarks that track with ARC-AGI
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Domande frequenti
About ARC-AGI
What does ARC-AGI measure?
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization. 48 AI models have been tested on it. Scores range from 0.1 to 98.0 out of 100.
Which model leads on ARC-AGI?
Gemini 3.1 Pro Preview from Google DeepMind leads ARC-AGI with a score of 98.0. The median score across 48 tested models is 42.8.
Is ARC-AGI saturated?
Yes · the top model on ARC-AGI has reached 98.0 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.
Does ARC-AGI predict performance on other benchmarks?
Yes · ARC-AGI scores correlate 0.94 with Cybench across 13 shared models. Models that do well on ARC-AGI tend to do well on Cybench.
How often is ARC-AGI data refreshed?
BenchGecko pulls updates daily. New model scores on ARC-AGI appear as soon as they are published by Epoch AI or the model provider.
- Categoria
- Reasoning
- Punteggio massimo
- 100
- Modelli
- 48
- Aggiornato
- 2026-03-05
Altri benchmark reasoning
Stessa categoria · valutazioni correlate