ARC-AGI
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
The Frontier
Best score over time · one chart, every benchmark
전체 순위
48 모델 테스트 완료 · 점수 순 정렬
| # | 모델 | 점수 |
|---|---|---|
| 1 | 98.0 | |
| 2 | 94.5 | |
| 3 | 94.0 | |
| 4 | 93.7 | |
| 5 | 90.5 | |
| 6 | 86.5 | |
| 7 | 86.2 | |
| 8 | 80.0 | |
| 9 | 75.0 | |
| 10 | 72.8 | |
| 11 | 70.2 | |
| 12 | 66.7 | |
| 13 | 65.7 | |
| 14 | 65.3 | |
| 15 | 63.7 | |
| 16 | 63.7 | |
| 17 | 60.8 | |
| 18 | 59.3 | |
| 19 | 58.7 | |
| 20 | 57.0 | |
| 21 | 54.3 | |
| 22 | 48.5 | |
| 23 | 47.7 | |
| 24 | 44.7 | |
| 25 | 41.0 | |
| 26 | 40.0 | |
| 27 | 35.7 | |
| 28 | 34.5 | |
| 29 | 32.3 | |
| 30 | 30.7 | |
| 31 | 28.6 | |
| 32 | 21.5 | |
| 33 | 21.2 | |
| 34 | 20.7 | |
| 35 | 18.0 | |
| 36 | 16.5 | |
| 37 | 15.8 | |
| 38 | 14.0 | |
| 39 | 11.0 | |
| 40 | 10.3 | |
| 41 | 5.5 | |
| 42 | 5.5 | |
| 43 | U Magistral Small 1.1 | 5.0 |
| 44 | 4.5 | |
| 45 | 4.4 | |
| 46 | 3.5 | |
| 47 | 0.5 | |
| 48 | 0.1 |
점수 분포
모델 밀집 구간
상관 벤치마크
Pearson r · 독자 연구
Benchmarks that track with ARC-AGI
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
자주 묻는 질문
About ARC-AGI
What does ARC-AGI measure?
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization. 48 AI models have been tested on it. Scores range from 0.1 to 98.0 out of 100.
Which model leads on ARC-AGI?
Gemini 3.1 Pro Preview from Google DeepMind leads ARC-AGI with a score of 98.0. The median score across 48 tested models is 42.8.
Is ARC-AGI saturated?
Yes · the top model on ARC-AGI has reached 98.0 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.
Does ARC-AGI predict performance on other benchmarks?
Yes · ARC-AGI scores correlate 0.94 with Cybench across 13 shared models. Models that do well on ARC-AGI tend to do well on Cybench.
How often is ARC-AGI data refreshed?
BenchGecko pulls updates daily. New model scores on ARC-AGI appear as soon as they are published by Epoch AI or the model provider.
- 카테고리
- Reasoning
- 최대 점수
- 100
- 모델
- 48
- 업데이트
- 2026-03-05
reasoning 벤치마크 더 보기
같은 카테고리 · 관련 평가