Benchmark · KnowledgeSettled

HELM · IFEval

Updated 2026-01-21
Models tested
34
Top score
95.1
Grok 3 Mini Beta
Median
84.0
min 73.2
Top-5 spread
σ 0.9
Settled

Best score over time · one chart, every benchmark

HELM · IFEVAL30 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 24Dec 24Apr 25Sep 25Jan 26RELEASE DATE →benchgecko.ai/benchmark/helm-ifeval · frontier
Frontier on HELM · IFEval rose from 78.2 to 95.1 in 9 months · +16.9 points · latest leader Grok 3 Mini Beta from xAI.
Pink dots = frontier records · 5 totalClick to open model page

Same category · related evaluations