Beta
Benchmark · Knowledge

LiveBench · Reasoning

Updated 2026-04-07
Models tested
29
Top score
84.6
GPT-5.1-Codex-Max
Median
59.3
min 17.4
Top-5 spread
σ 3.7
competitive

Best score over time · one chart, every benchmark

LIVEBENCH · REASONING29 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 25Sep 25Nov 25Feb 26Apr 26RELEASE DATE →benchgecko.ai/benchmark/livebench-reasoning · frontier
Frontier on LiveBench · Reasoning rose from 58.4 to 84.6 in 5 months · +26.1 points · latest leader GPT-5.1-Codex-Max from OpenAI.
Pink dots = frontier records · 6 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION0–10110–20220–30430–40240–50950–60560–70470–80280–9090–100MEDIAN · 59.3SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

29 models tested · sorted by score

Pulled from the LiveBench · Reasoning dataset · updated daily

What does LiveBench · Reasoning measure?

LiveBench · Reasoning is a knowledge benchmark in the BenchGecko catalog. 29 AI models have been tested on it. Scores range from 17.4 to 84.6 out of 100.

Which model leads on LiveBench · Reasoning?

GPT-5.1-Codex-Max from OpenAI leads LiveBench · Reasoning with a score of 84.6. The median score across 29 tested models is 59.3.

Is LiveBench · Reasoning saturated?

No · the top score is 84.6 out of 100 (85%). There is still meaningful room for improvement on LiveBench · Reasoning.

Does LiveBench · Reasoning predict performance on other benchmarks?

Yes · LiveBench · Reasoning scores correlate 0.92 with LiveBench · Overall across 29 shared models. Models that do well on LiveBench · Reasoning tend to do well on LiveBench · Overall.

How often is LiveBench · Reasoning data refreshed?

BenchGecko pulls updates daily. New model scores on LiveBench · Reasoning appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations