ARC-AGI-2
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.
The Frontier
Best score over time · one chart, every benchmark
全ランキング
50 テスト済みモデル · スコア順
| # | モデル | スコア |
|---|---|---|
| 1 | 83.3 | |
| 2 | 77.1 | |
| 3 | 74.0 | |
| 4 | 69.2 | |
| 5 | 60.4 | |
| 6 | 54.2 | |
| 7 | 52.9 | |
| 8 | 37.6 | |
| 9 | 33.6 | |
| 10 | 31.1 | |
| 11 | 18.3 | |
| 12 | 17.6 | |
| 13 | 16.0 | |
| 14 | 13.6 | |
| 15 | 11.8 | |
| 16 | 9.9 | |
| 17 | 8.6 | |
| 18 | 6.5 | |
| 19 | 6.1 | |
| 20 | 5.9 | |
| 21 | 5.3 | |
| 22 | 4.9 | |
| 23 | 4.9 | |
| 24 | 4.9 | |
| 25 | 4.9 | |
| 26 | 4.4 | |
| 27 | 4.0 | |
| 28 | 4.0 | |
| 29 | 3.0 | |
| 30 | 2.6 | |
| 31 | 2.5 | |
| 32 | 1.3 | |
| 33 | 1.3 | |
| 34 | 1.3 | |
| 35 | 1.1 | |
| 36 | 0.9 | |
| 37 | 0.8 | |
| 38 | 0.8 | |
| 39 | 0.8 | |
| 40 | 0.4 | |
| 41 | 0.4 | |
| 42 | 0.1 | |
| 43 | 0.1 | |
| 44 | 0.1 | |
| 45 | 0.1 | |
| 46 | 0.1 | |
| 47 | 0.1 | |
| 48 | 0.1 | |
| 49 | 0.1 | |
| 50 | U Magistral Small 1.1 | 0.1 |
スコア分布
モデルが集中する場所
相関ベンチマーク
ピアソンr · 独自調査
Benchmarks that track with ARC-AGI-2
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
よくある質問
About ARC-AGI-2
What does ARC-AGI-2 measure?
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data. 50 AI models have been tested on it. Scores range from 0.1 to 83.3 out of 100.
Which model leads on ARC-AGI-2?
GPT-5.4 Pro from OpenAI leads ARC-AGI-2 with a score of 83.3. The median score across 50 tested models is 4.7.
Is ARC-AGI-2 saturated?
No · the top score is 83.3 out of 100 (83%). There is still meaningful room for improvement on ARC-AGI-2.
Does ARC-AGI-2 predict performance on other benchmarks?
Yes · ARC-AGI-2 scores correlate 0.94 with GSO-Bench across 16 shared models. Models that do well on ARC-AGI-2 tend to do well on GSO-Bench.
How often is ARC-AGI-2 data refreshed?
BenchGecko pulls updates daily. New model scores on ARC-AGI-2 appear as soon as they are published by Epoch AI or the model provider.
- カテゴリ
- Reasoning
- 最高スコア
- 100
- モデル
- 50
- 更新日
- 2026-03-05
その他のreasoningベンチマーク
同カテゴリ · 関連する評価