Beta
Benchmark · Knowledge

HELM · Omni-MATH

Updated 2026-01-21
Models tested
34
Top score
72.2
GPT-5 Mini
Median
44.1
min 22.4
Top-5 spread
σ 2.6
competitive

Best score over time · one chart, every benchmark

HELM · OMNI-MATH30 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 24Dec 24Apr 25Sep 25Jan 26RELEASE DATE →benchgecko.ai/benchmark/helm-omni-math · frontier
Frontier on HELM · Omni-MATH rose from 28.0 to 72.2 in 13 months · +44.2 points · latest leader GPT-5 Mini from OpenAI.
Pink dots = frontier records · 11 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION0–1010–20620–30730–401140–50350–60460–70370–8080–9090–100MEDIAN · 44.1SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

Correlation analysis

Benchmarks that track with HELM · Omni-MATH

Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.

34 models tested · sorted by score

Pulled from the HELM · Omni-MATH dataset · updated daily

What does HELM · Omni-MATH measure?

HELM · Omni-MATH is a knowledge benchmark in the BenchGecko catalog. 34 AI models have been tested on it. Scores range from 22.4 to 72.2 out of 100.

Which model leads on HELM · Omni-MATH?

GPT-5 Mini from OpenAI leads HELM · Omni-MATH with a score of 72.2. The median score across 34 tested models is 44.1.

Is HELM · Omni-MATH saturated?

No · the top score is 72.2 out of 100 (72%). There is still meaningful room for improvement on HELM · Omni-MATH.

Does HELM · Omni-MATH predict performance on other benchmarks?

Yes · HELM · Omni-MATH scores correlate 0.86 with Cybench across 5 shared models. Models that do well on HELM · Omni-MATH tend to do well on Cybench.

How often is HELM · Omni-MATH data refreshed?

BenchGecko pulls updates daily. New model scores on HELM · Omni-MATH appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations