#	Model	Score	Price
1	Gemini 3.1 Pro Preview· Google DeepMind	98.0	$2.00
2	GPT-5.4 Pro· OpenAI	94.5	$30.00
3	Claude Opus 4.6· Anthropic	94.0	$5.00
4	GPT-5.4· OpenAI	93.7	$2.50
5	GPT-5.2 Pro· OpenAI	90.5	$21.00
6	Claude Sonnet 4.6· Anthropic	86.5	$3.00
7	GPT-5.2· OpenAI	86.2	$1.75
8	Claude Opus 4.5· Anthropic	80.0	$5.00
9	Gemini 3 Pro· Google DeepMind	75.0	—
10	GPT-5.1· OpenAI	72.8	$1.25
11	GPT-5 Pro· OpenAI	70.2	$15.00
12	Grok 4· xAI	66.7	$3.00
13	GPT-5· OpenAI	65.7	$1.25
14	Kimi K2.5· moonshotai	65.3	$0.38
15	Claude Sonnet 4.5· Anthropic	63.7	$3.00
16	MiniMax M2.5· minimax	63.7	$0.12
17	o3· OpenAI	60.8	$2.00
18	o3 Pro· OpenAI	59.3	$20.00
19	o4 Mini· OpenAI	58.7	$1.10
20	DeepSeek V3.2· DeepSeek	57.0	$0.26
21	GPT-5 Mini· OpenAI	54.3	$0.25
22	Grok 4 Fast· xAI	48.5	$0.20
23	Claude Haiku 4.5· Anthropic	47.7	$1.00
24	GLM 5· z-ai	44.7	$0.72
25	Gemini 2.5 Pro· Google DeepMind	41.0	$1.25
26	Claude Sonnet 4· Anthropic	40.0	$3.00
27	Claude Opus 4· Anthropic	35.7	$15.00
28	o3 Mini· OpenAI	34.5	$1.10
29	Gemini 2.5 Flash· Google DeepMind	32.3	$0.30
30	o1· OpenAI	30.7	$15.00
31	Claude 3.7 Sonnet· Anthropic	28.6	$3.00
32	Gemini 3 Flash Preview· Google DeepMind	21.5	$0.50
33	R1 0528· DeepSeek	21.2	$0.50
34	GPT-5 Nano· OpenAI	20.7	$0.05
35	o1-preview· OpenAI	18.0	—
36	Grok 3 Mini· xAI	16.5	$0.30
37	R1· DeepSeek	15.8	$0.70
38	o1-mini· OpenAI	14.0	—
39	Qwen3 235B A22B Instruct 2507· Alibaba Qwen	11.0	$0.07
40	GPT-4.5· OpenAI	10.3	—
41	GPT-4.1· OpenAI	5.5	$2.00
42	Grok 3· xAI	5.5	$3.00
43	U Magistral Small 1.1· Unknown	5.0	—
44	GPT-4o (2024-11-20)· OpenAI	4.5	$2.50
45	Llama 4 Maverick· Meta	4.4	$0.15
46	GPT-4.1 Mini· OpenAI	3.5	$0.40
47	Llama 4 Scout· Meta	0.5	$0.08
48	GPT-4.1 Nano· OpenAI	0.1	$0.10

Frequently asked

Pulled from the ARC-AGI dataset · updated daily

What does ARC-AGI measure?

ARC-AGI is a reasoning benchmark in the BenchGecko catalog. 48 AI models have been tested on it. Scores range from 0.1 to 98.0 out of 100.

Which model leads on ARC-AGI?

Gemini 3.1 Pro Preview from Google DeepMind leads ARC-AGI with a score of 98.0. The median score across 48 tested models is 42.8.

Is ARC-AGI saturated?

Yes · the top model on ARC-AGI has reached 98.0 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does ARC-AGI predict performance on other benchmarks?

Yes · ARC-AGI scores correlate 0.94 with Cybench across 13 shared models. Models that do well on ARC-AGI tend to do well on Cybench.

How often is ARC-AGI data refreshed?

BenchGecko pulls updates daily. New model scores on ARC-AGI appear as soon as they are published by Epoch AI or the model provider.