#	Model	Score	Price
1	DeepSeek V3.2 Speciale· DeepSeek	28.6	$0.40
2	Kimi K2.5· moonshotai	28.6	$0.38
3	GLM 5· z-ai	28.1	$0.72
4	Qwen3.5 397B A17B· Alibaba Qwen	27.5	$0.39
5	GLM 4.7· z-ai	25.4	$0.39
6	DeepSeek V3.2· DeepSeek	23.2	$0.26
7	MiniMax M2.5· minimax	22.2	$0.12
8	Step 3.5 Flash· stepfun	21.6	$0.10
9	Kimi K2 Thinking· moonshotai	21.3	$0.60
10	Gemini 2.5 Pro· Google DeepMind	21.1	$1.25
11	MiMo-V2-Flash· xiaomi	20.5	$0.09
12	GLM 4.6· z-ai	19.3	$0.39
13	Qwen3 235B A22B Thinking 2507· Alibaba Qwen	18.5	$0.15
14	gpt-oss-120b (free)· OpenAI	18.3	$0.00
15	GLM 4.5· z-ai	16.9	$0.60
16	R1 0528· DeepSeek	14.4	$0.50
17	Qwen3 Next 80B A3B Thinking· Alibaba Qwen	13.5	$0.10
18	MiniMax M2· minimax	13.4	$0.26
19	Qwen3 235B A22B Instruct 2507· Alibaba Qwen	12.3	$0.07
20	Qwen3 30B A3B Thinking 2507· Alibaba Qwen	11.7	$0.08
21	gpt-oss-20b (free)· OpenAI	11.6	$0.00
22	Claude Sonnet 4· Anthropic	8.7	$3.00
23	LongCat Flash Chat· meituan	8.5	$0.20
24	Qwen3 32B· Alibaba Qwen	8.5	$0.08
25	Qwen3 Next 80B A3B Instruct· Alibaba Qwen	8.0	$0.09
26	Qwen3 30B A3B Instruct 2507· Alibaba Qwen	7.7	$0.09
27	ERNIE 4.5 21B A3B Thinking· baidu	6.5	$0.07
28	Hunyuan A13B Instruct· tencent	6.0	$0.14
29	Qwen3 4B Thinking 2507· Alibaba	6.0	—
30	Qwen3 8B· Alibaba Qwen	5.5	$0.05
31	Qwen3 4B Instruct 2507· Alibaba	5.1	—
32	Gemma 3 27B· Google DeepMind	4.2	$0.08

Frequently asked

Pulled from the OpenCompass · HLE dataset · updated daily

What does OpenCompass · HLE measure?

OpenCompass · HLE is a knowledge benchmark in the BenchGecko catalog. 32 AI models have been tested on it. Scores range from 4.2 to 28.6 out of 100.

Which model leads on OpenCompass · HLE?

DeepSeek V3.2 Speciale from DeepSeek leads OpenCompass · HLE with a score of 28.6. The median score across 32 tested models is 13.9.

Is OpenCompass · HLE saturated?

No · the top score is 28.6 out of 100 (29%). There is still meaningful room for improvement on OpenCompass · HLE.

Does OpenCompass · HLE predict performance on other benchmarks?

Yes · OpenCompass · HLE scores correlate 0.98 with Artificial Analysis · Coding Index across 11 shared models. Models that do well on OpenCompass · HLE tend to do well on Artificial Analysis · Coding Index.

How often is OpenCompass · HLE data refreshed?

BenchGecko pulls updates daily. New model scores on OpenCompass · HLE appear as soon as they are published by Epoch AI or the model provider.