Benchmark · KnowledgeCompetitive

LiveBench · Reasoning

Updated 2026-04-07
Models tested
29
Top score
84.6
GPT-5.1-Codex-Max
Median
59.3
min 17.4
Top-5 spread
σ 3.7
Competitive

Best score over time · one chart, every benchmark

LIVEBENCH · REASONING29 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 25Sep 25Nov 25Feb 26Apr 26RELEASE DATE →benchgecko.ai/benchmark/livebench-reasoning · frontier
Frontier on LiveBench · Reasoning rose from 58.4 to 84.6 in 5 months · +26.1 points · latest leader GPT-5.1-Codex-Max from OpenAI.
Pink dots = frontier records · 6 totalClick to open model page

Same category · related evaluations