Beta
Benchmark · ReasoningGesättigt

ARC-AGI

ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.

Aktualisiert 2026-03-05
Getestete Modelle
48
Höchster Score
98.0
Gemini 3.1 Pro Preview
Median
42.8
Min. 0.1
Top-5-Spanne
σ 2.4
Umkämpft

Best score over time · one chart, every benchmark

ARC-AGI43 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Nov 24Mar 25Jul 25Nov 25Mar 26RELEASE DATE →benchgecko.ai/benchmark/arc-agi · frontier
Frontier on ARC-AGI rose from 4.5 to 98.0 in 15 months · +93.5 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.
Pink dots = frontier records · 12 totalClick to open model page

48 Modelle getestet · nach Score sortiert

#ModellScore
1Google DeepMind logoGemini 3.1 Pro Preview98.0
2OpenAI logoGPT-5.4 Pro94.5
3Anthropic logoClaude Opus 4.694.0
4OpenAI logoGPT-5.493.7
5OpenAI logoGPT-5.2 Pro90.5
6Anthropic logoClaude Sonnet 4.686.5
7OpenAI logoGPT-5.286.2
8Anthropic logoClaude Opus 4.580.0
9Google DeepMind logoGemini 3 Pro75.0
10OpenAI logoGPT-5.172.8
11OpenAI logoGPT-5 Pro70.2
12xAI logoGrok 466.7
13OpenAI logoGPT-565.7
14moonshotai logoKimi K2.565.3
15Anthropic logoClaude Sonnet 4.563.7
16minimax logoMiniMax M2.563.7
17OpenAI logoo360.8
18OpenAI logoo3 Pro59.3
19OpenAI logoo4 Mini58.7
20DeepSeek logoDeepSeek V3.257.0
21OpenAI logoGPT-5 Mini54.3
22xAI logoGrok 4 Fast48.5
23Anthropic logoClaude Haiku 4.547.7
24z-ai logoGLM 544.7
25Google DeepMind logoGemini 2.5 Pro41.0
26Anthropic logoClaude Sonnet 440.0
27Anthropic logoClaude Opus 435.7
28OpenAI logoo3 Mini34.5
29Google DeepMind logoGemini 2.5 Flash32.3
30OpenAI logoo130.7
31Anthropic logoClaude 3.7 Sonnet28.6
32Google DeepMind logoGemini 3 Flash Preview21.5
33DeepSeek logoR1 052821.2
34OpenAI logoGPT-5 Nano20.7
35OpenAI logoo1-preview18.0
36xAI logoGrok 3 Mini16.5
37DeepSeek logoR115.8
38OpenAI logoo1-mini14.0
39Alibaba Qwen logoQwen3 235B A22B Instruct 250711.0
40OpenAI logoGPT-4.510.3
41OpenAI logoGPT-4.15.5
42xAI logoGrok 35.5
43
U
Magistral Small 1.1
5.0
44OpenAI logoGPT-4o (2024-11-20)4.5
45Meta logoLlama 4 Maverick4.4
46OpenAI logoGPT-4.1 Mini3.5
47Meta logoLlama 4 Scout0.5
48OpenAI logoGPT-4.1 Nano0.1
Details
Kategorie
Reasoning
Max. Score
100
Modelle
48
Aktualisiert
2026-03-05

Gleiche Kategorie · verwandte Evaluierungen