ベータ
ベンチマーク · Reasoning確定的

ARC-AGI

ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.

更新日 2026-03-05
テスト済みモデル数
48
トップスコア
98.0
Gemini 3.1 Pro Preview
中央値
42.8
分 0.1
トップ5スプレッド
σ 2.4
競争的

Best score over time · one chart, every benchmark

ARC-AGI43 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Nov 24Mar 25Jul 25Nov 25Mar 26RELEASE DATE →benchgecko.ai/benchmark/arc-agi · frontier
Frontier on ARC-AGI rose from 4.5 to 98.0 in 15 months · +93.5 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.
Pink dots = frontier records · 12 totalClick to open model page

48 テスト済みモデル · スコア順

#モデルスコア
1Google DeepMind logoGemini 3.1 Pro Preview98.0
2OpenAI logoGPT-5.4 Pro94.5
3Anthropic logoClaude Opus 4.694.0
4OpenAI logoGPT-5.493.7
5OpenAI logoGPT-5.2 Pro90.5
6Anthropic logoClaude Sonnet 4.686.5
7OpenAI logoGPT-5.286.2
8Anthropic logoClaude Opus 4.580.0
9Google DeepMind logoGemini 3 Pro75.0
10OpenAI logoGPT-5.172.8
11OpenAI logoGPT-5 Pro70.2
12xAI logoGrok 466.7
13OpenAI logoGPT-565.7
14moonshotai logoKimi K2.565.3
15Anthropic logoClaude Sonnet 4.563.7
16minimax logoMiniMax M2.563.7
17OpenAI logoo360.8
18OpenAI logoo3 Pro59.3
19OpenAI logoo4 Mini58.7
20DeepSeek logoDeepSeek V3.257.0
21OpenAI logoGPT-5 Mini54.3
22xAI logoGrok 4 Fast48.5
23Anthropic logoClaude Haiku 4.547.7
24z-ai logoGLM 544.7
25Google DeepMind logoGemini 2.5 Pro41.0
26Anthropic logoClaude Sonnet 440.0
27Anthropic logoClaude Opus 435.7
28OpenAI logoo3 Mini34.5
29Google DeepMind logoGemini 2.5 Flash32.3
30OpenAI logoo130.7
31Anthropic logoClaude 3.7 Sonnet28.6
32Google DeepMind logoGemini 3 Flash Preview21.5
33DeepSeek logoR1 052821.2
34OpenAI logoGPT-5 Nano20.7
35OpenAI logoo1-preview18.0
36xAI logoGrok 3 Mini16.5
37DeepSeek logoR115.8
38OpenAI logoo1-mini14.0
39Alibaba Qwen logoQwen3 235B A22B Instruct 250711.0
40OpenAI logoGPT-4.510.3
41OpenAI logoGPT-4.15.5
42xAI logoGrok 35.5
43
U
Magistral Small 1.1
5.0
44OpenAI logoGPT-4o (2024-11-20)4.5
45Meta logoLlama 4 Maverick4.4
46OpenAI logoGPT-4.1 Mini3.5
47Meta logoLlama 4 Scout0.5
48OpenAI logoGPT-4.1 Nano0.1
詳細
カテゴリ
Reasoning
最高スコア
100
モデル
48
更新日
2026-03-05

同カテゴリ · 関連する評価