Beta
Benchmark · Knowledge

LiveBench · If

Updated 2026-04-07
Models tested
29
Top score
68.5
GLM 5.1
Median
43.2
min 13.5
Top-5 spread
σ 1.4
settled

Best score over time · one chart, every benchmark

LIVEBENCH · IF29 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 25Sep 25Nov 25Feb 26Apr 26RELEASE DATE →benchgecko.ai/benchmark/livebench-if · frontier
Frontier on LiveBench · If rose from 21.7 to 68.5 in 9 months · +46.7 points · latest leader GLM 5.1 from z-ai.
Pink dots = frontier records · 7 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION0–10610–20520–30130–40340–50650–60860–7070–8080–9090–100MEDIAN · 43.2SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

29 models tested · sorted by score

Pulled from the LiveBench · If dataset · updated daily

What does LiveBench · If measure?

LiveBench · If is a knowledge benchmark in the BenchGecko catalog. 29 AI models have been tested on it. Scores range from 13.5 to 68.5 out of 100.

Which model leads on LiveBench · If?

GLM 5.1 from z-ai leads LiveBench · If with a score of 68.5. The median score across 29 tested models is 43.2.

Is LiveBench · If saturated?

No · the top score is 68.5 out of 100 (68%). There is still meaningful room for improvement on LiveBench · If.

Does LiveBench · If predict performance on other benchmarks?

Yes · LiveBench · If scores correlate 0.83 with LiveBench · Overall across 29 shared models. Models that do well on LiveBench · If tend to do well on LiveBench · Overall.

How often is LiveBench · If data refreshed?

BenchGecko pulls updates daily. New model scores on LiveBench · If appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations