Which model leads on ARC-AGI?

Gemini 3.1 Pro Preview from Google DeepMind leads ARC-AGI with a score of 98.0. The median score across 49 tested models is 44.7.

Is ARC-AGI saturated?

Yes · the top model on ARC-AGI has reached 98.0 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does ARC-AGI predict performance on other benchmarks?

Yes · ARC-AGI scores correlate 0.94 with Cybench across 13 shared models. Models that do well on ARC-AGI tend to do well on Cybench.

How often is ARC-AGI data refreshed?

BenchGecko pulls updates daily. New model scores on ARC-AGI appear as soon as they are published by Epoch AI or the model provider.

Benchmark · ReasoningSettled

ARC-AGI

Name: ARC-AGI Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.

Updated 2026-04-23

Models tested

Top score

98.0

Gemini 3.1 Pro Preview

Median

44.7

min 0.1

Top-5 spread

σ 1.6

Settled

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on ARC-AGI rose from 4.5 to 98.0 in 15 months · +93.5 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.

Pink dots = frontier records · 12 totalClick to open model page

Full rankings

49 models tested · sorted by score

#	Model	Score	Price
1	Gemini 3.1 Pro Preview· Google DeepMind	98.0	$2.00
2	GPT-5.5· OpenAI	95.0	$5.00
3	GPT-5.4 Pro· OpenAI	94.5	$30.00
4	Claude Opus 4.6· Anthropic	94.0	$5.00
5	GPT-5.4· OpenAI	93.7	$2.50
6	GPT-5.2 Pro· OpenAI	90.5	$21.00
7	Claude Sonnet 4.6· Anthropic	86.5	$3.00
8	GPT-5.2· OpenAI	86.2	$1.75
9	Claude Opus 4.5· Anthropic	80.0	$5.00
10	Gemini 3 Pro· Google DeepMind	75.0	—
11	GPT-5.1· OpenAI	72.8	$1.25
12	GPT-5 Pro· OpenAI	70.2	$15.00
13	Grok 4· xAI	66.7	$3.00
14	GPT-5· OpenAI	65.7	$1.25
15	Kimi K2.5· moonshotai	65.3	$0.44
16	Claude Sonnet 4.5· Anthropic	63.7	$3.00
17	MiniMax M2.5· minimax	63.7	$0.15
18	o3· OpenAI	60.8	$2.00
19	o3 Pro· OpenAI	59.3	$20.00
20	o4 Mini· OpenAI	58.7	$1.10
21	DeepSeek V3.2· DeepSeek	57.0	$0.25
22	GPT-5 Mini· OpenAI	54.3	$0.25
23	Grok 4 Fast· xAI	48.5	$0.20
24	Claude Haiku 4.5· Anthropic	47.7	$1.00
25	GLM 5· z-ai	44.7	$0.60
26	Gemini 2.5 Pro· Google DeepMind	41.0	$1.25
27	Claude Sonnet 4· Anthropic	40.0	$3.00
28	Claude Opus 4· Anthropic	35.7	$15.00
29	o3 Mini· OpenAI	34.5	$1.10
30	Gemini 2.5 Flash· Google DeepMind	32.3	$0.30
31	o1· OpenAI	30.7	$15.00
32	Claude 3.7 Sonnet· Anthropic	28.6	$3.00
33	Gemini 3 Flash Preview· Google DeepMind	21.5	$0.50
34	R1 0528· DeepSeek	21.2	$0.50
35	GPT-5 Nano· OpenAI	20.7	$0.05
36	o1-preview· OpenAI	18.0	—
37	Grok 3 Mini· xAI	16.5	$0.30
38	R1· DeepSeek	15.8	$0.70
39	o1-mini· OpenAI	14.0	—
40	Qwen3 235B A22B Instruct 2507· Alibaba Qwen	11.0	$0.07
41	GPT-4.5· OpenAI	10.3	—
42	GPT-4.1· OpenAI	5.5	$2.00
43	Grok 3· xAI	5.5	$3.00
44	U Magistral Small 1.1· Unknown	5.0	—
45	GPT-4o (2024-11-20)· OpenAI	4.5	$2.50
46	Llama 4 Maverick· Meta	4.4	$0.15
47	GPT-4.1 Mini· OpenAI	3.5	$0.40
48	Llama 4 Scout· Meta	0.5	$0.08
49	GPT-4.1 Nano· OpenAI	0.1	$0.10