#	Model	Score	Price
1	GPT-5.1-Codex-Max· OpenAI	56.7	$1.25
2	GLM 5· z-ai	55.0	$0.72
3	GLM 5.1· z-ai	55.0	$0.95
4	Qwen3.6 Plus· Alibaba Qwen	55.0	$0.33
5	GPT-5.1-Codex· OpenAI	53.3	$1.25
6	GPT-5.2-Codex· OpenAI	51.7	$1.75
7	MiniMax M2.5· minimax	51.7	$0.12
8	MiniMax M2.7· minimax	50.0	$0.30
9	DeepSeek V3.2· DeepSeek	46.7	$0.26
10	Devstral 2 2512· Mistral AI	43.3	$0.40
11	GLM 4.7· z-ai	41.7	$0.39
12	Gemma 4 31B· Google DeepMind	40.0	$0.13
13	GPT-5.1-Codex-Mini· OpenAI	40.0	$0.25
14	Kimi K2 Thinking· moonshotai	38.3	$0.60
15	DeepSeek V3.2 Exp· DeepSeek	36.7	$0.27
16	GLM 4.6· z-ai	35.0	$0.39
17	GPT-5 Mini· OpenAI	35.0	$0.25
18	MiMo-V2-Pro· xiaomi	30.0	$1.00
19	GPT-5 Nano· OpenAI	28.3	$0.05
20	GPT-5.4 Nano· OpenAI	27.1	$0.20
21	Nemotron 3 Super· NVIDIA	23.0	$0.10
22	GPT-5.4 Mini· OpenAI	17.0	$0.75
23	gpt-oss-120b· OpenAI	16.7	$0.04
24	Qwen3 235B A22B Instruct 2507· Alibaba Qwen	13.3	$0.07
25	Qwen3 Next 80B A3B Instruct· Alibaba Qwen	10.0	$0.09
26	Qwen3 Next 80B A3B Thinking· Alibaba Qwen	8.3	$0.10
27	Qwen3 235B A22B Thinking 2507· Alibaba Qwen	6.7	$0.15
28	GLM 4.6V· z-ai	3.3	$0.30
29	GLM 5V Turbo· z-ai	3.3	$1.20

Frequently asked

Pulled from the LiveBench · Agentic Coding dataset · updated daily

What does LiveBench · Agentic Coding measure?

LiveBench · Agentic Coding is a knowledge benchmark in the BenchGecko catalog. 29 AI models have been tested on it. Scores range from 3.3 to 56.7 out of 100.

Which model leads on LiveBench · Agentic Coding?

GPT-5.1-Codex-Max from OpenAI leads LiveBench · Agentic Coding with a score of 56.7. The median score across 29 tested models is 36.7.

Is LiveBench · Agentic Coding saturated?

No · the top score is 56.7 out of 100 (57%). There is still meaningful room for improvement on LiveBench · Agentic Coding.

Does LiveBench · Agentic Coding predict performance on other benchmarks?

Yes · LiveBench · Agentic Coding scores correlate 0.93 with ARC-AGI-2 across 6 shared models. Models that do well on LiveBench · Agentic Coding tend to do well on ARC-AGI-2.

How often is LiveBench · Agentic Coding data refreshed?

BenchGecko pulls updates daily. New model scores on LiveBench · Agentic Coding appear as soon as they are published by Epoch AI or the model provider.