#	Model	Score	Price
1	Grok 3 Mini Beta· xAI	95.1	$0.30
2	Grok 4· xAI	94.9	$3.00
3	GPT-5.1· OpenAI	93.5	$1.25
4	GPT-5 Nano· OpenAI	93.2	$0.05
5	o4 Mini· OpenAI	92.9	$1.10
6	GPT-5 Mini· OpenAI	92.7	$0.25
7	GPT-4.1 Mini· OpenAI	90.4	$0.40
8	Gemini 2.5 Flash· Google DeepMind	89.8	$0.30
9	Grok 3 Beta· xAI	88.4	$3.00
10	Gemini 3 Pro· Google DeepMind	87.6	—
11	Mistral Large 2411· Mistral AI	87.6	$2.00
12	GPT-5 Chat· OpenAI	87.5	$1.25
13	o3· OpenAI	86.9	$2.00
14	Claude 3.5 Sonnet· Anthropic	85.6	—
15	Kimi K2 0711· moonshotai	85.0	$0.57
16	GPT-4.1 Nano· OpenAI	84.3	$0.10
17	Gemini 2.0 Flash· Google DeepMind	84.1	$0.10
18	Gemini 2.5 Pro· Google DeepMind	84.0	$1.25
19	GPT-4.1· OpenAI	83.8	$2.00
20	Gemini 1.5 Pro (Feb 2024)· Google DeepMind	83.7	—
21	gpt-oss-120b· OpenAI	83.6	$0.04
22	Claude 3.7 Sonnet· Anthropic	83.4	$3.00
23	DeepSeek V3· DeepSeek	83.2	$0.32
24	Gemini 1.5 Flash (May 2024)· Google DeepMind	83.1	—
25	Gemini 2.0 Flash Lite· Google DeepMind	82.4	$0.07
26	Palmyra X5· writer	82.3	$0.60
27	GPT-4o (2024-11-20)· OpenAI	81.7	$2.50
28	Gemini 2.5 Flash Lite· Google DeepMind	81.0	$0.10
29	Qwen3 Next 80B A3B Thinking· Alibaba Qwen	81.0	$0.10
30	Claude 3.5 Haiku· Anthropic	79.2	$0.80
31	R1 0528· DeepSeek	78.4	$0.50
32	GPT-4o-mini (2024-07-18)· OpenAI	78.2	$0.15
33	Mistral Small 3.1 24B· Mistral AI	75.0	$0.35
34	gpt-oss-20b· OpenAI	73.2	$0.03

Frequently asked

Pulled from the HELM · IFEval dataset · updated daily

What does HELM · IFEval measure?

HELM · IFEval is a knowledge benchmark in the BenchGecko catalog. 34 AI models have been tested on it. Scores range from 73.2 to 95.1 out of 100.

Which model leads on HELM · IFEval?

Grok 3 Mini Beta from xAI leads HELM · IFEval with a score of 95.1. The median score across 34 tested models is 84.0.

Is HELM · IFEval saturated?

Yes · the top model on HELM · IFEval has reached 95.1 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

Does HELM · IFEval predict performance on other benchmarks?

Yes · HELM · IFEval scores correlate 0.93 with Cybench across 5 shared models. Models that do well on HELM · IFEval tend to do well on Cybench.

How often is HELM · IFEval data refreshed?

BenchGecko pulls updates daily. New model scores on HELM · IFEval appear as soon as they are published by Epoch AI or the model provider.