Benchmark · KnowledgeSettled

IFEval

Updated 2025-07-24
Models tested
73
Top score
90.0
Llama 3.3 70B Instruct
Median
39.8
min 6.0
Top-5 spread
σ 2.1
Competitive

Best score over time · one chart, every benchmark

IFEVAL44 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑May 24Sep 24Dec 24Apr 25Jul 25RELEASE DATE →benchgecko.ai/benchmark/hf-ifeval · frontier
Frontier on IFEval rose from 74.4 to 90.0 in 5 months · +15.6 points · latest leader Llama 3.3 70B Instruct from Meta.
Pink dots = frontier records · 4 totalClick to open model page

73 models tested · sorted by score

#ModelScore
1Meta logoLlama 3.3 70B Instruct90.0
2Meta logoLlama 3.1 70B Instruct86.7
3Alibaba Qwen logoQwen2.5 72B Instruct86.4
4
HA
Qwen2.5 72B Instruct Abliterated
85.9
5Alibaba logoQwen2.5 32B Instruct83.5
6Alibaba logoQwen2.5 14B Instruct81.4
7Google DeepMind logoGemma 2 27B79.8
8nousresearch logoHermes 3 70B Instruct76.6
9Alibaba Qwen logoQwen2.5 7B Instruct75.8
10Google DeepMind logoGemma 2 9B74.4
11Meta logoMeta Llama 3 8B Instruct74.1
12Meta logoLlama 3.2 3B Instruct73.9
13Microsoft logoPhi 4 Mini Instruct73.8
14Alibaba Qwen logoQwen2.5 Coder 32B Instruct72.7
15Alibaba logoQwen2.5 Coder 14B Instruct69.1
16Microsoft logoPhi 468.8
17Alibaba logoQwen2.5 3B Instruct64.8
18Alibaba Qwen logoQwen2.5 Coder 7B Instruct61.0
19Meta logoLlama 3.2 1B Instruct58.1
20Microsoft logoPhi 3.5 Mini Instruct57.8
21Alibaba logoQwen2 7B Instruct56.8
22Google DeepMind logoGemma 2 2b It56.7
23anthracite-org logoMagnum v4 72B56.3
24Mistral AI logoMistral 7B Instruct V0.255.0
25Microsoft logoPhi 3 Mini 4k Instruct54.8
26nousresearch logoHermes 2 Pro - Llama-3 8B53.6
27Microsoft logoWizardLM-2 8x22B52.7
28Meta logoLlama 3.1 8B Instruct50.6
29Alibaba logoQwen2 VL 7B Instruct46.0
30Mistral AI logoMistral 7B Instruct v0.144.9
31Alibaba logoQwen2.5 1.5B Instruct44.8
32DeepSeek logoDeepSeek R1 Distill Qwen 14B43.8
33DeepSeek logoR1 Distill Llama 70B43.4
34DeepSeek logoR1 Distill Qwen 32B41.9
35DeepSeek logoDeepSeek R1 Distill Qwen 7B40.4
36Meta logoLlama 2 7b Chat Hf39.9
37Alibaba Qwen logoQwQ 32B39.8
38
D
Dolphin 2.9.1 Yi 1.5 34b
38.5
39Alibaba Qwen logoQwen2-72B38.2
40
U
Stable Beluga 2
37.9
41DeepSeek logoDeepSeek R1 Distill Llama 8B37.8
42DeepSeek logoDeepSeek R1 Distill Qwen 1.5B34.6
43Alibaba logoQwen2 1.5B Instruct33.7
44TII logoFalcon-180B32.6
45Alibaba logoQwen2.5 0.5B Instruct31.5
46
U
Yi 6B
30.5
47
HF
SmolLM2 135M Instruct
28.8
48
U
StarCoder 2 15B
27.8
49Microsoft logoPhi 227.4
50Google DeepMind logoGemma 2B26.6
51Meta logoLLaMA-13B25.3
52Meta logoLlama 2 7b Hf25.2
53Meta logoLlama 3 8B Instruct24.0
54Mistral AI logoMistral 7B V0.123.9
55
L
Vicuna 7b V1.5
23.5
56Alibaba logoQwen2 0.5B Instruct22.5
57OpenAI logoGpt2 Medium22.1
58
U
MPT-30B
21.5
59OpenAI logoGpt2 Large20.5
60Microsoft logoPhi-1.520.3
61Google DeepMind logoGemma 2 2b20.2
62eleutherai logoGpt Neo 125m19.1
63Alibaba logoQwen2 0.5B18.7
64
HF
SmolLM2 135M
18.2
65eleutherai logoPythia 160m18.2
66Meta logoLlama 3.1 405B18.1
67OpenAI logoGpt217.9
68
U
INTELLECT-1
17.6
69Meta logoMeta Llama 3 8B16.0
70z-ai logoGLM 4 32B 14.3
71Meta logoLlama 3.2 3B Instruct (free)13.4
72
D
Distilgpt2
6.1
73
T
TinyLlama 1.1B Chat V1.0
6.0

Same category · related evaluations