Beta
Benchmark · Reasoning

ARC-AGI

ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.

Updated 2026-03-05
Models tested
48
Top score
98.0
Gemini 3.1 Pro Preview
Median
42.8
min 0.1
Top-5 spread
σ 2.4
competitive

Best score over time · one chart, every benchmark

ARC-AGI43 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Nov 24Mar 25Jul 25Nov 25Mar 26RELEASE DATE →benchgecko.ai/benchmark/arc-agi · frontier
Frontier on ARC-AGI rose from 4.5 to 98.0 in 15 months · +93.5 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.
Pink dots = frontier records · 12 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION80–10610–20420–30430–40540–50450–60660–70370–80380–90590–100MEDIAN · 42.8SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

48 models tested · sorted by score

#ModelScore
1Google DeepMind logoGemini 3.1 Pro Preview98.0
2OpenAI logoGPT-5.4 Pro94.5
3Anthropic logoClaude Opus 4.694.0
4OpenAI logoGPT-5.493.7
5OpenAI logoGPT-5.2 Pro90.5
6Anthropic logoClaude Sonnet 4.686.5
7OpenAI logoGPT-5.286.2
8Anthropic logoClaude Opus 4.580.0
9Google DeepMind logoGemini 3 Pro75.0
10OpenAI logoGPT-5.172.8
11OpenAI logoGPT-5 Pro70.2
12xAI logoGrok 466.7
13OpenAI logoGPT-565.7
14moonshotai logoKimi K2.565.3
15Anthropic logoClaude Sonnet 4.563.7
16minimax logoMiniMax M2.563.7
17OpenAI logoo360.8
18OpenAI logoo3 Pro59.3
19OpenAI logoo4 Mini58.7
20DeepSeek logoDeepSeek V3.257.0
21OpenAI logoGPT-5 Mini54.3
22xAI logoGrok 4 Fast48.5
23Anthropic logoClaude Haiku 4.547.7
24z-ai logoGLM 544.7
25Google DeepMind logoGemini 2.5 Pro41.0
26Anthropic logoClaude Sonnet 440.0
27Anthropic logoClaude Opus 435.7
28OpenAI logoo3 Mini34.5
29Google DeepMind logoGemini 2.5 Flash32.3
30OpenAI logoo130.7
31Anthropic logoClaude 3.7 Sonnet28.6
32Google DeepMind logoGemini 3 Flash Preview21.5
33DeepSeek logoR1 052821.2
34OpenAI logoGPT-5 Nano20.7
35OpenAI logoo1-preview18.0
36xAI logoGrok 3 Mini16.5
37DeepSeek logoR115.8
38OpenAI logoo1-mini14.0
39Alibaba Qwen logoQwen3 235B A22B Instruct 250711.0
40OpenAI logoGPT-4.510.3
41OpenAI logoGPT-4.15.5
42xAI logoGrok 35.5
43
U
Magistral Small 1.1
5.0
44OpenAI logoGPT-4o (2024-11-20)4.5
45Meta logoLlama 4 Maverick4.4
46OpenAI logoGPT-4.1 Mini3.5
47Meta logoLlama 4 Scout0.5
48OpenAI logoGPT-4.1 Nano0.1

Pulled from the ARC-AGI dataset · updated daily

What does ARC-AGI measure?

ARC-AGI is a reasoning benchmark in the BenchGecko catalog. 48 AI models have been tested on it. Scores range from 0.1 to 98.0 out of 100.

Which model leads on ARC-AGI?

Gemini 3.1 Pro Preview from Google DeepMind leads ARC-AGI with a score of 98.0. The median score across 48 tested models is 42.8.

Is ARC-AGI saturated?

Yes · the top model on ARC-AGI has reached 98.0 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does ARC-AGI predict performance on other benchmarks?

Yes · ARC-AGI scores correlate 0.94 with Cybench across 13 shared models. Models that do well on ARC-AGI tend to do well on Cybench.

How often is ARC-AGI data refreshed?

BenchGecko pulls updates daily. New model scores on ARC-AGI appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations