베타
벤치마크 · Reasoning안정

ARC-AGI

ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.

업데이트 2026-03-05
테스트된 모델 수
48
최고 점수
98.0
Gemini 3.1 Pro Preview
중간값
42.8
분 0.1
상위 5개 분포
σ 2.4
경합 중

Best score over time · one chart, every benchmark

ARC-AGI43 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Nov 24Mar 25Jul 25Nov 25Mar 26RELEASE DATE →benchgecko.ai/benchmark/arc-agi · frontier
Frontier on ARC-AGI rose from 4.5 to 98.0 in 15 months · +93.5 points · latest leader Gemini 3.1 Pro Preview from Google DeepMind.
Pink dots = frontier records · 12 totalClick to open model page

48 모델 테스트 완료 · 점수 순 정렬

#모델점수
1Google DeepMind logoGemini 3.1 Pro Preview98.0
2OpenAI logoGPT-5.4 Pro94.5
3Anthropic logoClaude Opus 4.694.0
4OpenAI logoGPT-5.493.7
5OpenAI logoGPT-5.2 Pro90.5
6Anthropic logoClaude Sonnet 4.686.5
7OpenAI logoGPT-5.286.2
8Anthropic logoClaude Opus 4.580.0
9Google DeepMind logoGemini 3 Pro75.0
10OpenAI logoGPT-5.172.8
11OpenAI logoGPT-5 Pro70.2
12xAI logoGrok 466.7
13OpenAI logoGPT-565.7
14moonshotai logoKimi K2.565.3
15Anthropic logoClaude Sonnet 4.563.7
16minimax logoMiniMax M2.563.7
17OpenAI logoo360.8
18OpenAI logoo3 Pro59.3
19OpenAI logoo4 Mini58.7
20DeepSeek logoDeepSeek V3.257.0
21OpenAI logoGPT-5 Mini54.3
22xAI logoGrok 4 Fast48.5
23Anthropic logoClaude Haiku 4.547.7
24z-ai logoGLM 544.7
25Google DeepMind logoGemini 2.5 Pro41.0
26Anthropic logoClaude Sonnet 440.0
27Anthropic logoClaude Opus 435.7
28OpenAI logoo3 Mini34.5
29Google DeepMind logoGemini 2.5 Flash32.3
30OpenAI logoo130.7
31Anthropic logoClaude 3.7 Sonnet28.6
32Google DeepMind logoGemini 3 Flash Preview21.5
33DeepSeek logoR1 052821.2
34OpenAI logoGPT-5 Nano20.7
35OpenAI logoo1-preview18.0
36xAI logoGrok 3 Mini16.5
37DeepSeek logoR115.8
38OpenAI logoo1-mini14.0
39Alibaba Qwen logoQwen3 235B A22B Instruct 250711.0
40OpenAI logoGPT-4.510.3
41OpenAI logoGPT-4.15.5
42xAI logoGrok 35.5
43
U
Magistral Small 1.1
5.0
44OpenAI logoGPT-4o (2024-11-20)4.5
45Meta logoLlama 4 Maverick4.4
46OpenAI logoGPT-4.1 Mini3.5
47Meta logoLlama 4 Scout0.5
48OpenAI logoGPT-4.1 Nano0.1
상세 정보
카테고리
Reasoning
최대 점수
100
모델
48
업데이트
2026-03-05

같은 카테고리 · 관련 평가