Benchmark · KnowledgeSettled

HELM · MMLU-Pro

Updated 2026-01-21
Models tested
34
Top score
90.3
Gemini 3 Pro
Median
78.0
min 53.7
Top-5 spread
σ 1.8
Settled

Best score over time · one chart, every benchmark

HELM · MMLU-PRO30 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 24Dec 24Apr 25Sep 25Jan 26RELEASE DATE →benchgecko.ai/benchmark/helm-mmlu-pro · frontier
Frontier on HELM · MMLU-Pro rose from 60.3 to 86.3 in 11 months · +26.0 points · latest leader Gemini 2.5 Pro from Google DeepMind.
Pink dots = frontier records · 11 totalClick to open model page

Same category · related evaluations