Benchmark · KnowledgeSettled

HELM · GPQA

Updated 2026-01-21
Models tested
34
Top score
80.3
Gemini 3 Pro
Median
61.1
min 30.9
Top-5 spread
σ 2.2
Competitive

Best score over time · one chart, every benchmark

HELM · GPQA30 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 24Dec 24Apr 25Sep 25Jan 26RELEASE DATE →benchgecko.ai/benchmark/helm-gpqa · frontier
Frontier on HELM · GPQA rose from 36.8 to 79.1 in 13 months · +42.3 points · latest leader GPT-5 Chat from OpenAI.
Pink dots = frontier records · 10 totalClick to open model page

Same category · related evaluations