Math rankingData updated · May 6, 202612 ranked models

Best AI Models for Math

A math-focused composite built from arithmetic, competition-style, and frontier math benchmarks. Saturated benchmarks are useful, but coverage across harder tests gets clearer confidence.

Methodology Compare models

Rank 1Medium confidence

GPT-5.4 Pro

OpenAI

Composite score

Rank 2Medium confidence

GPT-5.4

OpenAI

82.2

Composite score

Rank 3Medium confidence

Qwen3 Max

Alibaba Qwen

78.8

Composite score

Ranked model table

Scores are based on the visible benchmark set and available metadata.

Missing prices stay missing

Rank	Model	Score	Evidence	Input price	Context
#1	GPT-5.4 Pro OpenAI	90	2 benchmarks · Medium	$30/M	1.1M
#2	GPT-5.4 OpenAI	82.2	3 benchmarks · Medium	$2.50/M	1.1M
#3	Qwen3 Max Alibaba Qwen	78.8	2 benchmarks · Medium	$0.78/M	262K
#4	R1 0528 DeepSeek	75.3	2 benchmarks · Medium	$0.50/M	164K
#5	Claude Opus 4.6 Anthropic	74.2	3 benchmarks · Medium	$5.00/M	1M
#6	GPT-5.2 OpenAI	71.3	3 benchmarks · Medium	$1.75/M	400K
#7	GPT-5 OpenAI	69.6	4 benchmarks · High	$1.25/M	400K
#8	gpt-oss-120b OpenAI	69.6	2 benchmarks · Medium	$0.04/M	131K
#9	R1 DeepSeek	67.4	2 benchmarks · Medium	$0.70/M	64K
#10	GPT-5 Mini OpenAI	65	5 benchmarks · High	$0.25/M	400K
#11	o4 Mini OpenAI	59.5	4 benchmarks · High	$1.10/M	200K
#12	o1 OpenAI	58.9	3 benchmarks · Medium	$15/M	200K

Benchmarks used

livebench-mathematics

Open benchmark page

otis-mock-aime-2024-2025

Open benchmark page

frontiermath-2025-02-28-private

Open benchmark page

frontiermath-tier-4-2025-07-01-private

Open benchmark page

Strict caveat

Math scores depend heavily on benchmark format. Use the linked benchmark pages before choosing a model for high-stakes mathematical work.

BenchGecko ranks models from published benchmark scores and model metadata. Scores do not measure every use case, and missing data can affect rankings.

Related ranking

Best AI Models for Reasoning

Reasoning models ranked from public benchmark scores across GPQA Diamond, BBH, ARC-AGI, SimpleBench, and related tests.

Related ranking

Best AI Models for Coding

Coding models ranked from published coding benchmark scores, listed prices, and model metadata tracked by BenchGecko.

Related ranking

Best Open-weight AI Models

Open-weight AI models ranked from available benchmark data, coverage confidence, pricing metadata, and listed license signals.

Questions

Which benchmarks are used for math rankings?

This page uses GSM8K, MATH-level tasks, AIME-style tasks, and FrontierMath when those scores are available.

Why show confidence labels?

A model with two math scores is less proven than a model with broader evidence. Confidence labels make that coverage visible.

Can this ranking replace manual math validation?

No. Use it as a benchmark-backed shortlist, then validate models on your own problem set.