#	Model	Score	Price
1	GPT-5.1-Codex-Max· OpenAI	84.6	$1.25
2	GPT-5.1-Codex· OpenAI	82.0	$1.25
3	GPT-5.2-Codex· OpenAI	77.7	$1.75
4	Qwen3.6 Plus· Alibaba Qwen	75.8	$0.33
5	MiniMax M2.7· minimax	74.8	$0.30
6	GLM 5.1· z-ai	72.5	$0.95
7	MiMo-V2-Pro· xiaomi	69.7	$1.00
8	GLM 5· z-ai	69.1	$0.72
9	GPT-5.1-Codex-Mini· OpenAI	64.7	$0.25
10	Kimi K2 Thinking· moonshotai	63.5	$0.60
11	GLM 4.6· z-ai	62.1	$0.39
12	GLM 4.7· z-ai	59.7	$0.39
13	Gemma 4 31B· Google DeepMind	59.4	$0.13
14	Qwen3 235B A22B Thinking 2507· Alibaba Qwen	59.4	$0.15
15	MiniMax M2.5· minimax	59.3	$0.12
16	GPT-5 Mini· OpenAI	58.6	$0.25
17	Qwen3 235B A22B Instruct 2507· Alibaba Qwen	58.4	$0.07
18	Qwen3 Next 80B A3B Thinking· Alibaba Qwen	58.2	$0.10
19	GLM 5V Turbo· z-ai	56.1	$1.20
20	Qwen3 Next 80B A3B Instruct· Alibaba Qwen	54.8	$0.09
21	DeepSeek V3.2 Exp· DeepSeek	45.5	$0.27
22	DeepSeek V3.2· DeepSeek	44.3	$0.26
23	gpt-oss-120b· OpenAI	39.2	$0.04
24	GLM 4.6V· z-ai	37.2	$0.30
25	GPT-5 Nano· OpenAI	35.5	$0.05
26	Nemotron 3 Super· NVIDIA	34.4	$0.10
27	Devstral 2 2512· Mistral AI	27.7	$0.40
28	GPT-5.4 Mini· OpenAI	21.9	$0.75
29	GPT-5.4 Nano· OpenAI	17.4	$0.20

Frequently asked

Pulled from the LiveBench · Reasoning dataset · updated daily

What does LiveBench · Reasoning measure?

LiveBench · Reasoning is a knowledge benchmark in the BenchGecko catalog. 29 AI models have been tested on it. Scores range from 17.4 to 84.6 out of 100.

Which model leads on LiveBench · Reasoning?

GPT-5.1-Codex-Max from OpenAI leads LiveBench · Reasoning with a score of 84.6. The median score across 29 tested models is 59.3.

Is LiveBench · Reasoning saturated?

No · the top score is 84.6 out of 100 (85%). There is still meaningful room for improvement on LiveBench · Reasoning.

Does LiveBench · Reasoning predict performance on other benchmarks?

Yes · LiveBench · Reasoning scores correlate 0.92 with LiveBench · Overall across 29 shared models. Models that do well on LiveBench · Reasoning tend to do well on LiveBench · Overall.

How often is LiveBench · Reasoning data refreshed?

BenchGecko pulls updates daily. New model scores on LiveBench · Reasoning appear as soon as they are published by Epoch AI or the model provider.