#	Model	Score	Price
1	Kimi K2.5· moonshotai	93.9	$0.38
2	GLM 5· z-ai	93.2	$0.72
3	Step 3.5 Flash· stepfun	93.2	$0.10
4	Kimi K2 Thinking· moonshotai	92.4	$0.60
5	DeepSeek V3.2 Speciale· DeepSeek	91.7	$0.40
6	Qwen3.5 397B A17B· Alibaba Qwen	91.5	$0.39
7	MiniMax M2.5· minimax	91.1	$0.12
8	GLM 4.7· z-ai	90.2	$0.39
9	gpt-oss-120b (free)· OpenAI	90.2	$0.00
10	LongCat Flash Chat· meituan	90.2	$0.20
11	MiniMax M2· minimax	90.2	$0.26
12	Gemini 2.5 Pro· Google DeepMind	90.0	$1.25
13	DeepSeek V3.2· DeepSeek	89.7	$0.26
14	Qwen3 30B A3B Thinking 2507· Alibaba Qwen	89.7	$0.08
15	MiMo-V2-Flash· xiaomi	89.5	$0.09
16	Qwen3 Next 80B A3B Thinking· Alibaba Qwen	89.5	$0.10
17	gpt-oss-20b (free)· OpenAI	88.9	$0.00
18	GLM 4.6· z-ai	88.7	$0.39
19	Qwen3 4B Thinking 2507· Alibaba	88.5	—
20	Claude Sonnet 4· Anthropic	88.3	$3.00
21	Qwen3 235B A22B Instruct 2507· Alibaba Qwen	88.3	$0.07
22	Qwen3 235B A22B Thinking 2507· Alibaba Qwen	87.8	$0.15
23	Qwen3 Next 80B A3B Instruct· Alibaba Qwen	87.6	$0.09
24	Qwen3 32B· Alibaba Qwen	86.0	$0.08
25	Qwen3 8B· Alibaba Qwen	85.6	$0.05
26	GLM 4.5· z-ai	85.4	$0.60
27	Qwen3 30B A3B Instruct 2507· Alibaba Qwen	83.9	$0.09
28	Qwen3 4B Instruct 2507· Alibaba	82.4	—
29	ERNIE 4.5 21B A3B Thinking· baidu	81.2	$0.07
30	Gemma 3 27B· Google DeepMind	81.0	$0.08
31	R1 0528· DeepSeek	80.0	$0.50
32	Hunyuan A13B Instruct· tencent	60.3	$0.14

Frequently asked

Pulled from the OpenCompass · IFEval dataset · updated daily

What does OpenCompass · IFEval measure?

OpenCompass · IFEval is a knowledge benchmark in the BenchGecko catalog. 32 AI models have been tested on it. Scores range from 60.3 to 93.9 out of 100.

Which model leads on OpenCompass · IFEval?

Kimi K2.5 from moonshotai leads OpenCompass · IFEval with a score of 93.9. The median score across 32 tested models is 89.2.

Is OpenCompass · IFEval saturated?

No · the top score is 93.9 out of 100 (94%). There is still meaningful room for improvement on OpenCompass · IFEval.

Does OpenCompass · IFEval predict performance on other benchmarks?

Yes · OpenCompass · IFEval scores correlate 0.90 with LiveBench · Overall across 10 shared models. Models that do well on OpenCompass · IFEval tend to do well on LiveBench · Overall.

How often is OpenCompass · IFEval data refreshed?

BenchGecko pulls updates daily. New model scores on OpenCompass · IFEval appear as soon as they are published by Epoch AI or the model provider.