#	Model	Score	Price
1	DeepSeek V3.2 Speciale· DeepSeek	96.0	$0.40
2	GLM 5· z-ai	95.8	$0.72
3	Step 3.5 Flash· stepfun	95.7	$0.10
4	GLM 4.7· z-ai	95.4	$0.39
5	Kimi K2 Thinking· moonshotai	94.1	$0.60
6	gpt-oss-120b (free)· OpenAI	93.4	$0.00
7	DeepSeek V3.2· DeepSeek	93.0	$0.26
8	MiMo-V2-Flash· xiaomi	92.9	$0.09
9	Qwen3.5 397B A17B· Alibaba Qwen	92.3	$0.39
10	Kimi K2.5· moonshotai	91.9	$0.38
11	Qwen3 235B A22B Thinking 2507· Alibaba Qwen	90.9	$0.15
12	GLM 4.6· z-ai	90.3	$0.39
13	Qwen3 Next 80B A3B Thinking· Alibaba Qwen	89.0	$0.10
14	R1 0528· DeepSeek	89.0	$0.50
15	Gemini 2.5 Pro· Google DeepMind	88.7	$1.25
16	gpt-oss-20b (free)· OpenAI	87.9	$0.00
17	Qwen3 30B A3B Thinking 2507· Alibaba Qwen	86.8	$0.08
18	MiniMax M2.5· minimax	86.2	$0.12
19	GLM 4.5· z-ai	85.8	$0.60
20	Qwen3 4B Thinking 2507· Alibaba	80.0	—
21	MiniMax M2· minimax	79.1	$0.26
22	ERNIE 4.5 21B A3B Thinking· baidu	76.2	$0.07
23	Qwen3 32B· Alibaba Qwen	70.3	$0.08
24	Qwen3 235B A22B Instruct 2507· Alibaba Qwen	69.5	$0.07
25	Qwen3 Next 80B A3B Instruct· Alibaba Qwen	69.2	$0.09
26	Claude Sonnet 4· Anthropic	68.7	$3.00
27	Qwen3 8B· Alibaba Qwen	66.2	$0.05
28	Hunyuan A13B Instruct· tencent	65.7	$0.14
29	Qwen3 30B A3B Instruct 2507· Alibaba Qwen	63.8	$0.09
30	LongCat Flash Chat· meituan	61.0	$0.20
31	Qwen3 4B Instruct 2507· Alibaba	46.9	—
32	Gemma 3 27B· Google DeepMind	22.4	$0.08

Frequently asked

Pulled from the OpenCompass · AIME2025 dataset · updated daily

What does OpenCompass · AIME2025 measure?

OpenCompass · AIME2025 is a knowledge benchmark in the BenchGecko catalog. 32 AI models have been tested on it. Scores range from 22.4 to 96.0 out of 100.

Which model leads on OpenCompass · AIME2025?

DeepSeek V3.2 Speciale from DeepSeek leads OpenCompass · AIME2025 with a score of 96.0. The median score across 32 tested models is 87.3.

Is OpenCompass · AIME2025 saturated?

Yes · the top model on OpenCompass · AIME2025 has reached 96.0 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does OpenCompass · AIME2025 predict performance on other benchmarks?

Yes · OpenCompass · AIME2025 scores correlate 0.94 with GPQA diamond across 10 shared models. Models that do well on OpenCompass · AIME2025 tend to do well on GPQA diamond.

How often is OpenCompass · AIME2025 data refreshed?

BenchGecko pulls updates daily. New model scores on OpenCompass · AIME2025 appear as soon as they are published by Epoch AI or the model provider.