Benchmark · ReasoningSettled

ARC-AGI

ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.

Updated 2026-04-23
Models tested
49
Top score
98.0
Gemini 3.1 Pro Preview
Median
44.7
min 0.1
Top-5 spread
σ 1.6
Settled

Best score over time · one chart, every benchmark

ARC-AGI44 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Nov 24Mar 25Aug 25Dec 25Apr 26RELEASE DATE →benchgecko.ai/benchmark/arc-agi · frontier
Frontier on ARC-AGI rose from 4.5 to 98.0 in 15 months · +93.5 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.
Pink dots = frontier records · 12 totalClick to open model page

49 models tested · sorted by score

#ModelScore
1Google DeepMind logoGemini 3.1 Pro Preview98.0
2OpenAI logoGPT-5.595.0
3OpenAI logoGPT-5.4 Pro94.5
4Anthropic logoClaude Opus 4.694.0
5OpenAI logoGPT-5.493.7
6OpenAI logoGPT-5.2 Pro90.5
7Anthropic logoClaude Sonnet 4.686.5
8OpenAI logoGPT-5.286.2
9Anthropic logoClaude Opus 4.580.0
10Google DeepMind logoGemini 3 Pro75.0
11OpenAI logoGPT-5.172.8
12OpenAI logoGPT-5 Pro70.2
13xAI logoGrok 466.7
14OpenAI logoGPT-565.7
15moonshotai logoKimi K2.565.3
16Anthropic logoClaude Sonnet 4.563.7
17minimax logoMiniMax M2.563.7
18OpenAI logoo360.8
19OpenAI logoo3 Pro59.3
20OpenAI logoo4 Mini58.7
21DeepSeek logoDeepSeek V3.257.0
22OpenAI logoGPT-5 Mini54.3
23xAI logoGrok 4 Fast48.5
24Anthropic logoClaude Haiku 4.547.7
25z-ai logoGLM 544.7
26Google DeepMind logoGemini 2.5 Pro41.0
27Anthropic logoClaude Sonnet 440.0
28Anthropic logoClaude Opus 435.7
29OpenAI logoo3 Mini34.5
30Google DeepMind logoGemini 2.5 Flash32.3
31OpenAI logoo130.7
32Anthropic logoClaude 3.7 Sonnet28.6
33Google DeepMind logoGemini 3 Flash Preview21.5
34DeepSeek logoR1 052821.2
35OpenAI logoGPT-5 Nano20.7
36OpenAI logoo1-preview18.0
37xAI logoGrok 3 Mini16.5
38DeepSeek logoR115.8
39OpenAI logoo1-mini14.0
40Alibaba Qwen logoQwen3 235B A22B Instruct 250711.0
41OpenAI logoGPT-4.510.3
42OpenAI logoGPT-4.15.5
43xAI logoGrok 35.5
44
U
Magistral Small 1.1
5.0
45OpenAI logoGPT-4o (2024-11-20)4.5
46Meta logoLlama 4 Maverick4.4
47OpenAI logoGPT-4.1 Mini3.5
48Meta logoLlama 4 Scout0.5
49OpenAI logoGPT-4.1 Nano0.1
Details
Category
Reasoning
Max score
100
Models
49
Updated
2026-04-23

Same category · related evaluations