ARC-AGI
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
The Frontier
Best score over time · one chart, every benchmark
全ランキング
48 テスト済みモデル · スコア順
| # | モデル | スコア |
|---|---|---|
| 1 | 98.0 | |
| 2 | 94.5 | |
| 3 | 94.0 | |
| 4 | 93.7 | |
| 5 | 90.5 | |
| 6 | 86.5 | |
| 7 | 86.2 | |
| 8 | 80.0 | |
| 9 | 75.0 | |
| 10 | 72.8 | |
| 11 | 70.2 | |
| 12 | 66.7 | |
| 13 | 65.7 | |
| 14 | 65.3 | |
| 15 | 63.7 | |
| 16 | 63.7 | |
| 17 | 60.8 | |
| 18 | 59.3 | |
| 19 | 58.7 | |
| 20 | 57.0 | |
| 21 | 54.3 | |
| 22 | 48.5 | |
| 23 | 47.7 | |
| 24 | 44.7 | |
| 25 | 41.0 | |
| 26 | 40.0 | |
| 27 | 35.7 | |
| 28 | 34.5 | |
| 29 | 32.3 | |
| 30 | 30.7 | |
| 31 | 28.6 | |
| 32 | 21.5 | |
| 33 | 21.2 | |
| 34 | 20.7 | |
| 35 | 18.0 | |
| 36 | 16.5 | |
| 37 | 15.8 | |
| 38 | 14.0 | |
| 39 | 11.0 | |
| 40 | 10.3 | |
| 41 | 5.5 | |
| 42 | 5.5 | |
| 43 | U Magistral Small 1.1 | 5.0 |
| 44 | 4.5 | |
| 45 | 4.4 | |
| 46 | 3.5 | |
| 47 | 0.5 | |
| 48 | 0.1 |
スコア分布
モデルが集中する場所
相関ベンチマーク
ピアソンr · 独自調査
Benchmarks that track with ARC-AGI
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
よくある質問
About ARC-AGI
What does ARC-AGI measure?
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization. 48 AI models have been tested on it. Scores range from 0.1 to 98.0 out of 100.
Which model leads on ARC-AGI?
Gemini 3.1 Pro Preview from Google DeepMind leads ARC-AGI with a score of 98.0. The median score across 48 tested models is 42.8.
Is ARC-AGI saturated?
Yes · the top model on ARC-AGI has reached 98.0 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.
Does ARC-AGI predict performance on other benchmarks?
Yes · ARC-AGI scores correlate 0.94 with Cybench across 13 shared models. Models that do well on ARC-AGI tend to do well on Cybench.
How often is ARC-AGI data refreshed?
BenchGecko pulls updates daily. New model scores on ARC-AGI appear as soon as they are published by Epoch AI or the model provider.
- カテゴリ
- Reasoning
- 最高スコア
- 100
- モデル
- 48
- 更新日
- 2026-03-05
その他のreasoningベンチマーク
同カテゴリ · 関連する評価