#	Model	Score	Price
1	Qwen3.5 397B A17B· Alibaba Qwen	88.4	$0.39
2	Kimi K2.5· moonshotai	88.1	$0.38
3	GLM 4.7· z-ai	86.9	$0.39
4	DeepSeek V3.2 Speciale· DeepSeek	86.7	$0.40
5	GLM 5· z-ai	85.3	$0.72
6	Gemini 2.5 Pro· Google DeepMind	84.7	$1.25
7	DeepSeek V3.2· DeepSeek	84.6	$0.26
8	MiniMax M2.5· minimax	84.6	$0.12
9	Step 3.5 Flash· stepfun	83.7	$0.10
10	Kimi K2 Thinking· moonshotai	82.7	$0.60
11	MiMo-V2-Flash· xiaomi	82.1	$0.09
12	R1 0528· DeepSeek	80.6	$0.50
13	GLM 4.6· z-ai	80.4	$0.39
14	Qwen3 235B A22B Thinking 2507· Alibaba Qwen	79.8	$0.15
15	GLM 4.5· z-ai	79.5	$0.60
16	gpt-oss-120b (free)· OpenAI	78.9	$0.00
17	MiniMax M2· minimax	78.7	$0.26
18	Qwen3 Next 80B A3B Thinking· Alibaba Qwen	77.0	$0.10
19	Qwen3 235B A22B Instruct 2507· Alibaba Qwen	75.5	$0.07
20	Claude Sonnet 4· Anthropic	74.6	$3.00
21	Qwen3 Next 80B A3B Instruct· Alibaba Qwen	74.1	$0.09
22	LongCat Flash Chat· meituan	70.1	$0.20
23	Qwen3 30B A3B Thinking 2507· Alibaba Qwen	70.1	$0.08
24	gpt-oss-20b (free)· OpenAI	68.9	$0.00
25	Qwen3 32B· Alibaba Qwen	67.3	$0.08
26	Qwen3 4B Thinking 2507· Alibaba	64.7	—
27	Hunyuan A13B Instruct· tencent	64.1	$0.14
28	Qwen3 30B A3B Instruct 2507· Alibaba Qwen	63.6	$0.09
29	ERNIE 4.5 21B A3B Thinking· baidu	60.2	$0.07
30	Qwen3 8B· Alibaba Qwen	59.7	$0.05
31	Qwen3 4B Instruct 2507· Alibaba	52.3	—
32	Gemma 3 27B· Google DeepMind	46.3	$0.08

Frequently asked

Pulled from the OpenCompass · GPQA-Diamond dataset · updated daily

What does OpenCompass · GPQA-Diamond measure?

OpenCompass · GPQA-Diamond is a knowledge benchmark in the BenchGecko catalog. 32 AI models have been tested on it. Scores range from 46.3 to 88.4 out of 100.

Which model leads on OpenCompass · GPQA-Diamond?

Qwen3.5 397B A17B from Alibaba Qwen leads OpenCompass · GPQA-Diamond with a score of 88.4. The median score across 32 tested models is 78.8.

Is OpenCompass · GPQA-Diamond saturated?

No · the top score is 88.4 out of 100 (88%). There is still meaningful room for improvement on OpenCompass · GPQA-Diamond.

Does OpenCompass · GPQA-Diamond predict performance on other benchmarks?

Yes · OpenCompass · GPQA-Diamond scores correlate 0.98 with GPQA diamond across 10 shared models. Models that do well on OpenCompass · GPQA-Diamond tend to do well on GPQA diamond.

How often is OpenCompass · GPQA-Diamond data refreshed?

BenchGecko pulls updates daily. New model scores on OpenCompass · GPQA-Diamond appear as soon as they are published by Epoch AI or the model provider.