Which model leads on Artificial Analysis · Agentic Index?

GPT-5.4 from OpenAI leads Artificial Analysis · Agentic Index with a score of 69.4. The median score across 62 tested models is 38.8.

Is Artificial Analysis · Agentic Index saturated?

Yes · the top model on Artificial Analysis · Agentic Index has reached 69.4 out of 60, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does Artificial Analysis · Agentic Index predict performance on other benchmarks?

Yes · Artificial Analysis · Agentic Index scores correlate 0.96 with GeoBench across 5 shared models. Models that do well on Artificial Analysis · Agentic Index tend to do well on GeoBench.

How often is Artificial Analysis · Agentic Index data refreshed?

BenchGecko pulls updates daily. New model scores on Artificial Analysis · Agentic Index appear as soon as they are published by Epoch AI or the model provider.

Benchmark · KnowledgeSettled

Artificial Analysis · Agentic Index

Name: Artificial Analysis · Agentic Index Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"

Updated 2026-04-07

The agentic index shows the fastest frontier progression of any composite metric. Models released 6 months apart show 15-20 point gaps, reflecting rapid improvements in tool use and planning.

Scoring: Weighted composite of multiple agentic benchmark scores. Components include SWE-bench, tool use, planning, and error recovery. Higher is better.

Models tested

Top score

69.4

GPT-5.4

Median

38.8

min 2.7

Top-5 spread

σ 2.5

Competitive

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on Artificial Analysis · Agentic Index rose from 2.7 to 69.4 in 13 months · +66.7 points · latest leader GPT-5.4 from OpenAI.

Pink dots = frontier records · 10 totalClick to open model page

Full rankings

62 models tested · sorted by score

#	Model	Score	Price
1	GPT-5.4· OpenAI	69.4	$2.50
2	Claude Opus 4.6 (Fast)· Anthropic	67.6	$30.00
3	GLM 5.1· z-ai	67.0	$1.05
4	GLM 5 Turbo· z-ai	63.1	$1.20
5	Claude Sonnet 4.6· Anthropic	63.0	$3.00
6	MiMo-V2-Pro· xiaomi	62.8	$1.00
7	GPT-5.3-Codex· OpenAI	62.2	$1.75
8	U Muse Spark· Unknown	62.0	—
9	Qwen3.6 Plus· Alibaba Qwen	61.7	$0.33
10	MiniMax M2.7· minimax	61.5	$0.30
11	GLM 5V Turbo· z-ai	61.1	$1.20
12	Gemini 3.1 Pro Preview· Google DeepMind	59.1	$2.00
13	Kimi K2.5· moonshotai	58.9	$0.44
14	MiMo-V2-Omni· xiaomi	58.6	$0.40
15	Qwen3.5 397B A17B· Alibaba Qwen	55.8	$0.39
16	GPT-5.4 Mini· OpenAI	55.7	$0.75
17	Qwen3.5-27B· Alibaba Qwen	54.6	$0.20
18	Qwen3.5-122B-A10B· Alibaba Qwen	53.0	$0.26
19	DeepSeek V3.2· DeepSeek	52.9	$0.25
20	Step 3.5 Flash· stepfun	52.0	$0.10
21	Qwen3 Max Thinking· Alibaba Qwen	50.1	$0.78
22	Gemini 3 Flash Preview· Google DeepMind	49.7	$0.50
23	Grok 4.1 Fast· xAI	49.3	$0.20
24	GPT-5.4 Nano· OpenAI	49.3	$0.20
25	MiMo-V2-Flash· xiaomi	48.8	$0.09
26	Gemini 3 Pro· Google DeepMind	45.0	—
27	Qwen3.5-35B-A3B· Alibaba Qwen	44.1	$0.15
28	Trinity Large Thinking· arcee-ai	42.6	$0.22
29	Qwen3 Coder Next· Alibaba Qwen	42.1	$0.12
30	Gemma 4 31B (free)· Google DeepMind	40.9	$0.00
31	Mercury 2· inception	39.7	$0.25
32	gpt-oss-120b (free)· OpenAI	37.9	$0.00
33	Qwen3.5-9B· Alibaba Qwen	37.4	$0.10
34	o3· OpenAI	36.1	$2.00
35	Grok Code Fast 1· xAI	35.6	$0.20
36	Solar Pro 3· upstage	34.9	$0.15
37	Gemini 2.5 Pro· Google DeepMind	32.7	$1.25
38	Qwen3.5 4B· Alibaba	32.5	—
39	Gemma 4 26B A4B (free)· Google DeepMind	32.1	$0.00
40	gpt-oss-20b (free)· OpenAI	27.6	$0.00
41	Gemini 3.1 Flash Lite Preview· Google DeepMind	25.7	$0.25
42	Mistral Medium 3.1· Mistral AI	25.3	$0.40
43	Qwen3 Next 80B A3B Instruct (free)· Alibaba Qwen	23.6	$0.00
44	Mistral Small 4· Mistral AI	23.4	$0.15
45	Qwen3.5 2B· Alibaba	23.0	—
46	R1 0528· DeepSeek	20.8	$0.50
47	INTELLECT-3· prime-intellect	19.8	$0.20
48	Qwen3 Coder 480B A35B (free)· Alibaba Qwen	18.3	$0.00
49	Qwen3.5 0.8B· Alibaba	15.9	—
50	Qwen3 Next 80B A3B Instruct· Alibaba Qwen	14.2	$0.09
51	Gemini 2.5 Flash Lite· Google DeepMind	11.7	$0.10
52	NVIDIA Nemotron Nano 9B V2· NVIDIA	9.4	—
53	Llama 4 Maverick· Meta	7.2	$0.15
54	N Nanbeige4.1 3B· Nanbeige	7.2	—
55	LFM2.5-1.2B-Thinking (free)· liquid	6.5	$0.00
56	Llama 4 Scout· Meta	5.2	$0.08
57	Command A· Cohere	5.1	$2.50
58	Granite 4.0 Micro· ibm-granite	4.2	$0.02
59	Llama 3.1 Nemotron Ultra 253B v1· NVIDIA	3.8	$0.60
60	LFM2-24B-A2B· liquid	3.7	$0.03
61	LFM2.5-1.2B-Instruct (free)· liquid	3.6	$0.00
62	Phi 4 Mini Instruct· Microsoft	2.7	—

Details

Category: Knowledge
Creator: Artificial Analysis
Max score: 60
Modality: Text
Scoring: Weighted composite of multiple agentic benchmark scores. Components include SWE-bench, tool use, planning, and error recovery. Higher is better.
Models: 62
Updated: 2026-04-07

Tests

Multi-step tool usePlanningError recoveryAutonomous task completion

Does not test

VisionLong contextKnowledge recallCreative writing

Links

Artificial Analysis

Gecko's Take

“The Agentic Index is the single number that matters most for 2026. If you are building agents, this is your shortlist filter. If you are investing, this predicts which provider captures the agent platform market.”

Related benchmarks