ARC-AGI-2
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.
The Frontier
Best score over time · one chart, every benchmark
Distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with ARC-AGI-2
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Full rankings
50 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 83.3 | |
| 2 | 77.1 | |
| 3 | 74.0 | |
| 4 | 69.2 | |
| 5 | 60.4 | |
| 6 | 54.2 | |
| 7 | 52.9 | |
| 8 | 37.6 | |
| 9 | 33.6 | |
| 10 | 31.1 | |
| 11 | 18.3 | |
| 12 | 17.6 | |
| 13 | 16.0 | |
| 14 | 13.6 | |
| 15 | 11.8 | |
| 16 | 9.9 | |
| 17 | 8.6 | |
| 18 | 6.5 | |
| 19 | 6.1 | |
| 20 | 5.9 | |
| 21 | 5.3 | |
| 22 | 4.9 | |
| 23 | 4.9 | |
| 24 | 4.9 | |
| 25 | 4.9 | |
| 26 | 4.4 | |
| 27 | 4.0 | |
| 28 | 4.0 | |
| 29 | 3.0 | |
| 30 | 2.6 | |
| 31 | 2.5 | |
| 32 | 1.3 | |
| 33 | 1.3 | |
| 34 | 1.3 | |
| 35 | 1.1 | |
| 36 | 0.9 | |
| 37 | 0.8 | |
| 38 | 0.8 | |
| 39 | 0.8 | |
| 40 | 0.4 | |
| 41 | 0.4 | |
| 42 | 0.1 | |
| 43 | 0.1 | |
| 44 | 0.1 | |
| 45 | 0.1 | |
| 46 | 0.1 | |
| 47 | 0.1 | |
| 48 | 0.1 | |
| 49 | 0.1 | |
| 50 | U Magistral Small 1.1 | 0.1 |
Frequently asked
Pulled from the ARC-AGI-2 dataset · updated daily
What does ARC-AGI-2 measure?
ARC-AGI-2 is a reasoning benchmark in the BenchGecko catalog. 50 AI models have been tested on it. Scores range from 0.1 to 83.3 out of 100.
Which model leads on ARC-AGI-2?
GPT-5.4 Pro from OpenAI leads ARC-AGI-2 with a score of 83.3. The median score across 50 tested models is 4.7.
Is ARC-AGI-2 saturated?
No · the top score is 83.3 out of 100 (83%). There is still meaningful room for improvement on ARC-AGI-2.
Does ARC-AGI-2 predict performance on other benchmarks?
Yes · ARC-AGI-2 scores correlate 0.94 with GSO-Bench across 16 shared models. Models that do well on ARC-AGI-2 tend to do well on GSO-Bench.
How often is ARC-AGI-2 data refreshed?
BenchGecko pulls updates daily. New model scores on ARC-AGI-2 appear as soon as they are published by Epoch AI or the model provider.
More reasoning benchmarks
Same category · related evaluations