测试版
基准测试 · Reasoning已尘埃落定

ARC-AGI

ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.

已更新 2026-03-05
已测试模型
48
最高分
98.0
Gemini 3.1 Pro Preview
中位数
42.8
最低 0.1
前 5 名差距
σ 2.4
竞争中

Best score over time · one chart, every benchmark

ARC-AGI43 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Nov 24Mar 25Jul 25Nov 25Mar 26RELEASE DATE →benchgecko.ai/benchmark/arc-agi · frontier
Frontier on ARC-AGI rose from 4.5 to 98.0 in 15 months · +93.5 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.
Pink dots = frontier records · 12 totalClick to open model page

48 已测试模型 · 按分数排序

#模型分数
1Google DeepMind logoGemini 3.1 Pro Preview98.0
2OpenAI logoGPT-5.4 Pro94.5
3Anthropic logoClaude Opus 4.694.0
4OpenAI logoGPT-5.493.7
5OpenAI logoGPT-5.2 Pro90.5
6Anthropic logoClaude Sonnet 4.686.5
7OpenAI logoGPT-5.286.2
8Anthropic logoClaude Opus 4.580.0
9Google DeepMind logoGemini 3 Pro75.0
10OpenAI logoGPT-5.172.8
11OpenAI logoGPT-5 Pro70.2
12xAI logoGrok 466.7
13OpenAI logoGPT-565.7
14moonshotai logoKimi K2.565.3
15Anthropic logoClaude Sonnet 4.563.7
16minimax logoMiniMax M2.563.7
17OpenAI logoo360.8
18OpenAI logoo3 Pro59.3
19OpenAI logoo4 Mini58.7
20DeepSeek logoDeepSeek V3.257.0
21OpenAI logoGPT-5 Mini54.3
22xAI logoGrok 4 Fast48.5
23Anthropic logoClaude Haiku 4.547.7
24z-ai logoGLM 544.7
25Google DeepMind logoGemini 2.5 Pro41.0
26Anthropic logoClaude Sonnet 440.0
27Anthropic logoClaude Opus 435.7
28OpenAI logoo3 Mini34.5
29Google DeepMind logoGemini 2.5 Flash32.3
30OpenAI logoo130.7
31Anthropic logoClaude 3.7 Sonnet28.6
32Google DeepMind logoGemini 3 Flash Preview21.5
33DeepSeek logoR1 052821.2
34OpenAI logoGPT-5 Nano20.7
35OpenAI logoo1-preview18.0
36xAI logoGrok 3 Mini16.5
37DeepSeek logoR115.8
38OpenAI logoo1-mini14.0
39Alibaba Qwen logoQwen3 235B A22B Instruct 250711.0
40OpenAI logoGPT-4.510.3
41OpenAI logoGPT-4.15.5
42xAI logoGrok 35.5
43
U
Magistral Small 1.1
5.0
44OpenAI logoGPT-4o (2024-11-20)4.5
45Meta logoLlama 4 Maverick4.4
46OpenAI logoGPT-4.1 Mini3.5
47Meta logoLlama 4 Scout0.5
48OpenAI logoGPT-4.1 Nano0.1
详情
类别
Reasoning
最高分
100
模型
48
已更新
2026-03-05

同类别 · 相关评测