测试版
基准测试 · Reasoning竞争中

ARC-AGI-2

ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.

已更新 2026-03-05
已测试模型
50
最高分
83.3
GPT-5.4 Pro
中位数
4.7
最低 0.1
前 5 名差距
σ 7.7
竞争激烈

Best score over time · one chart, every benchmark

ARC-AGI-245 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 24Dec 24May 25Oct 25Mar 26RELEASE DATE →benchgecko.ai/benchmark/arc-agi-2 · frontier
Frontier on ARC-AGI-2 rose from 0.1 to 83.3 in 20 months · +83.2 points · latest leader GPT-5.4 Pro from OpenAI.
Pink dots = frontier records · 13 totalClick to open model page

50 已测试模型 · 按分数排序

#模型分数
1OpenAI logoGPT-5.4 Pro83.3
2Google DeepMind logoGemini 3.1 Pro Preview77.1
3OpenAI logoGPT-5.474.0
4Anthropic logoClaude Opus 4.669.2
5Anthropic logoClaude Sonnet 4.660.4
6OpenAI logoGPT-5.2 Pro54.2
7OpenAI logoGPT-5.252.9
8Anthropic logoClaude Opus 4.537.6
9Google DeepMind logoGemini 3 Flash Preview33.6
10Google DeepMind logoGemini 3 Pro31.1
11OpenAI logoGPT-5 Pro18.3
12OpenAI logoGPT-5.117.6
13xAI logoGrok 416.0
14Anthropic logoClaude Sonnet 4.513.6
15moonshotai logoKimi K2.511.8
16OpenAI logoGPT-59.9
17Anthropic logoClaude Opus 48.6
18OpenAI logoo36.5
19OpenAI logoo4 Mini6.1
20Anthropic logoClaude Sonnet 45.9
21xAI logoGrok 4 Fast5.3
22Google DeepMind logoGemini 2.5 Pro4.9
23z-ai logoGLM 54.9
24minimax logoMiniMax M2.54.9
25OpenAI logoo3 Pro4.9
26OpenAI logoGPT-5 Mini4.4
27Anthropic logoClaude Haiku 4.54.0
28DeepSeek logoDeepSeek V3.24.0
29OpenAI logoo3 Mini3.0
30OpenAI logoGPT-5 Nano2.6
31Google DeepMind logoGemini 2.5 Flash2.5
32Google DeepMind logoGemini 2.0 Flash1.3
33DeepSeek logoR11.3
34Alibaba Qwen logoQwen3 235B A22B Instruct 25071.3
35DeepSeek logoR1 05281.1
36Anthropic logoClaude 3.7 Sonnet0.9
37OpenAI logoo1-mini0.8
38Google DeepMind logoGemini 1.5 Pro (Feb 2024)0.8
39OpenAI logoGPT-4.50.8
40OpenAI logoGPT-4.10.4
41xAI logoGrok 3 Mini0.4
42OpenAI logoGPT-4.1 Mini0.1
43OpenAI logoGPT-4.1 Nano0.1
44OpenAI logoGPT-4o (2024-11-20)0.1
45OpenAI logoGPT-4o-mini0.1
46OpenAI logoGPT-4o-mini (2024-07-18)0.1
47xAI logoGrok 30.1
48Meta logoLlama 4 Maverick0.1
49Meta logoLlama 4 Scout0.1
50
U
Magistral Small 1.1
0.1
详情
类别
Reasoning
最高分
100
模型
50
已更新
2026-03-05

同类别 · 相关评测