ARC-AGI
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
The Frontier
Best score over time · one chart, every benchmark
Vollständige Bestenliste
48 Modelle getestet · nach Score sortiert
| # | Modell | Score |
|---|---|---|
| 1 | 98.0 | |
| 2 | 94.5 | |
| 3 | 94.0 | |
| 4 | 93.7 | |
| 5 | 90.5 | |
| 6 | 86.5 | |
| 7 | 86.2 | |
| 8 | 80.0 | |
| 9 | 75.0 | |
| 10 | 72.8 | |
| 11 | 70.2 | |
| 12 | 66.7 | |
| 13 | 65.7 | |
| 14 | 65.3 | |
| 15 | 63.7 | |
| 16 | 63.7 | |
| 17 | 60.8 | |
| 18 | 59.3 | |
| 19 | 58.7 | |
| 20 | 57.0 | |
| 21 | 54.3 | |
| 22 | 48.5 | |
| 23 | 47.7 | |
| 24 | 44.7 | |
| 25 | 41.0 | |
| 26 | 40.0 | |
| 27 | 35.7 | |
| 28 | 34.5 | |
| 29 | 32.3 | |
| 30 | 30.7 | |
| 31 | 28.6 | |
| 32 | 21.5 | |
| 33 | 21.2 | |
| 34 | 20.7 | |
| 35 | 18.0 | |
| 36 | 16.5 | |
| 37 | 15.8 | |
| 38 | 14.0 | |
| 39 | 11.0 | |
| 40 | 10.3 | |
| 41 | 5.5 | |
| 42 | 5.5 | |
| 43 | U Magistral Small 1.1 | 5.0 |
| 44 | 4.5 | |
| 45 | 4.4 | |
| 46 | 3.5 | |
| 47 | 0.5 | |
| 48 | 0.1 |
Score-Verteilung
Wo sich Modelle häufen
Korrelierte Benchmarks
Pearson r · Originalforschung
Benchmarks that track with ARC-AGI
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Häufig gefragt
About ARC-AGI
What does ARC-AGI measure?
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization. 48 AI models have been tested on it. Scores range from 0.1 to 98.0 out of 100.
Which model leads on ARC-AGI?
Gemini 3.1 Pro Preview from Google DeepMind leads ARC-AGI with a score of 98.0. The median score across 48 tested models is 42.8.
Is ARC-AGI saturated?
Yes · the top model on ARC-AGI has reached 98.0 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.
Does ARC-AGI predict performance on other benchmarks?
Yes · ARC-AGI scores correlate 0.94 with Cybench across 13 shared models. Models that do well on ARC-AGI tend to do well on Cybench.
How often is ARC-AGI data refreshed?
BenchGecko pulls updates daily. New model scores on ARC-AGI appear as soon as they are published by Epoch AI or the model provider.
- Kategorie
- Reasoning
- Max. Score
- 100
- Modelle
- 48
- Aktualisiert
- 2026-03-05
Mehr reasoning Benchmarks
Gleiche Kategorie · verwandte Evaluierungen