#	Model	Score	Price
1	Qwen3.5 397B A17B· Alibaba Qwen	87.6	$0.39
2	Kimi K2.5· moonshotai	86.2	$0.38
3	DeepSeek V3.2· DeepSeek	85.8	$0.26
4	Gemini 2.5 Pro· Google DeepMind	85.8	$1.25
5	DeepSeek V3.2 Speciale· DeepSeek	85.5	$0.40
6	GLM 5· z-ai	85.2	$0.72
7	Kimi K2 Thinking· moonshotai	84.3	$0.60
8	GLM 4.7· z-ai	84.0	$0.39
9	Qwen3 235B A22B Thinking 2507· Alibaba Qwen	83.5	$0.15
10	R1 0528· DeepSeek	83.5	$0.50
11	Step 3.5 Flash· stepfun	83.5	$0.10
12	MiMo-V2-Flash· xiaomi	83.1	$0.09
13	Claude Sonnet 4· Anthropic	83.0	$3.00
14	GLM 4.6· z-ai	83.0	$0.39
15	GLM 4.5· z-ai	82.7	$0.60
16	Qwen3 Next 80B A3B Thinking· Alibaba Qwen	82.0	$0.10
17	MiniMax M2.5· minimax	81.7	$0.12
18	MiniMax M2· minimax	81.6	$0.26
19	Qwen3 Next 80B A3B Instruct· Alibaba Qwen	81.3	$0.09
20	LongCat Flash Chat· meituan	81.0	$0.20
21	gpt-oss-120b (free)· OpenAI	79.7	$0.00
22	Qwen3 30B A3B Thinking 2507· Alibaba Qwen	79.5	$0.08
23	Qwen3 235B A22B Instruct 2507· Alibaba Qwen	79.2	$0.07
24	Qwen3 32B· Alibaba Qwen	78.0	$0.08
25	Qwen3 30B A3B Instruct 2507· Alibaba Qwen	73.9	$0.09
26	Hunyuan A13B Instruct· tencent	73.8	$0.14
27	gpt-oss-20b (free)· OpenAI	72.8	$0.00
28	Qwen3 4B Thinking 2507· Alibaba	72.8	—
29	Qwen3 8B· Alibaba Qwen	72.1	$0.05
30	ERNIE 4.5 21B A3B Thinking· baidu	70.8	$0.07
31	Gemma 3 27B· Google DeepMind	67.8	$0.08
32	Qwen3 4B Instruct 2507· Alibaba	63.0	—

Frequently asked

Pulled from the OpenCompass · MMLU-Pro dataset · updated daily

What does OpenCompass · MMLU-Pro measure?

OpenCompass · MMLU-Pro is a knowledge benchmark in the BenchGecko catalog. 32 AI models have been tested on it. Scores range from 63.0 to 87.6 out of 100.

Which model leads on OpenCompass · MMLU-Pro?

Qwen3.5 397B A17B from Alibaba Qwen leads OpenCompass · MMLU-Pro with a score of 87.6. The median score across 32 tested models is 81.8.

Is OpenCompass · MMLU-Pro saturated?

No · the top score is 87.6 out of 100 (88%). There is still meaningful room for improvement on OpenCompass · MMLU-Pro.

Does OpenCompass · MMLU-Pro predict performance on other benchmarks?

Yes · OpenCompass · MMLU-Pro scores correlate 0.98 with GPQA diamond across 10 shared models. Models that do well on OpenCompass · MMLU-Pro tend to do well on GPQA diamond.

How often is OpenCompass · MMLU-Pro data refreshed?

BenchGecko pulls updates daily. New model scores on OpenCompass · MMLU-Pro appear as soon as they are published by Epoch AI or the model provider.