Benchmark · KnowledgeSettled

MATH Level 5

Updated 2025-04-15
Models tested
70
Top score
62.5
Qwen2.5 32B Instruct
Median
12.9
min 0.1
Top-5 spread
σ 2.5
Competitive

Best score over time · one chart, every benchmark

MATH LEVEL 542 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑May 24Aug 24Oct 24Jan 25Apr 25RELEASE DATE →benchgecko.ai/benchmark/hf-math-lvl5 · frontier
Frontier on MATH Level 5 rose from 27.6 to 62.5 in 4 months · +34.9 points · latest leader Qwen2.5 32B Instruct from Alibaba.
Pink dots = frontier records · 4 totalClick to open model page

70 models tested · sorted by score

#ModelScore
1Alibaba logoQwen2.5 32B Instruct62.5
2
HA
Qwen2.5 72B Instruct Abliterated
60.1
3Alibaba Qwen logoQwen2.5 72B Instruct59.8
4DeepSeek logoDeepSeek R1 Distill Qwen 14B57.0
5Alibaba logoQwen2.5 14B Instruct55.3
6Microsoft logoPhi 450.0
7Alibaba Qwen logoQwen2.5 7B Instruct50.0
8Alibaba Qwen logoQwen2.5 Coder 32B Instruct49.5
9Meta logoLlama 3.3 70B Instruct48.3
10Meta logoLlama 3.1 70B Instruct38.1
11Alibaba Qwen logoQwen2.5 Coder 7B Instruct37.2
12Alibaba logoQwen2.5 3B Instruct36.8
13Alibaba logoQwen2.5 Coder 14B Instruct32.5
14Alibaba Qwen logoQwen2-72B31.1
15DeepSeek logoR1 Distill Llama 70B30.7
16Alibaba logoQwen2 7B Instruct27.6
17Microsoft logoWizardLM-2 8x22B25.0
18Google DeepMind logoGemma 2 27B23.9
19Alibaba logoQwen2.5 1.5B Instruct22.1
20DeepSeek logoDeepSeek R1 Distill Llama 8B22.0
21nousresearch logoHermes 3 70B Instruct21.0
22anthracite-org logoMagnum v4 72B20.0
23Alibaba logoQwen2 VL 7B Instruct19.9
24Microsoft logoPhi 3.5 Mini Instruct19.6
25DeepSeek logoDeepSeek R1 Distill Qwen 7B19.6
26Google DeepMind logoGemma 2 9B19.5
27
D
Dolphin 2.9.1 Yi 1.5 34b
18.7
28Meta logoMeta Llama 3 8B18.6
29Meta logoLlama 3.2 3B Instruct17.7
30DeepSeek logoR1 Distill Qwen 32B17.1
31Microsoft logoPhi 4 Mini Instruct17.0
32DeepSeek logoDeepSeek R1 Distill Qwen 1.5B16.9
33Microsoft logoPhi 3 Mini 4k Instruct16.4
34Alibaba Qwen logoQwQ 32B16.1
35Meta logoLlama 3.1 8B Instruct15.5
36Alibaba logoQwen2.5 0.5B Instruct10.3
37Meta logoMeta Llama 3 8B Instruct8.7
38nousresearch logoHermes 2 Pro - Llama-3 8B8.4
39Meta logoLlama 3.2 1B Instruct8.2
40Google DeepMind logoGemma 2B7.4
41Alibaba logoQwen2 1.5B Instruct7.2
42
U
StarCoder 2 15B
6.0
43
U
Yi 6B
5.1
44
U
Stable Beluga 2
4.4
45Meta logoLlama 3 8B Instruct3.9
46Meta logoLLaMA-13B3.1
47Google DeepMind logoGemma 2 2b3.0
48Mistral AI logoMistral 7B Instruct V0.23.0
49Mistral AI logoMistral 7B V0.13.0
50Microsoft logoPhi 23.0
51Alibaba logoQwen2 0.5B Instruct2.9
52TII logoFalcon-180B2.8
53Alibaba logoQwen2 0.5B2.6
54Mistral AI logoMistral 7B Instruct v0.12.3
55Meta logoLlama 2 7b Chat Hf2.0
56Meta logoLlama 3.2 3B Instruct (free)1.9
57Microsoft logoPhi-1.51.8
58Meta logoLlama 2 7b Hf1.7
59
U
MPT-30B
1.6
60
T
TinyLlama 1.1B Chat V1.0
1.5
61
L
Vicuna 7b V1.5
1.4
62OpenAI logoGpt2 Large1.2
63
HF
SmolLM2 135M
1.2
64eleutherai logoPythia 160m0.9
65OpenAI logoGpt2 Medium0.8
66
D
Distilgpt2
0.6
67eleutherai logoGpt Neo 125m0.6
68
HF
SmolLM2 135M Instruct
0.3
69OpenAI logoGpt20.2
70Google DeepMind logoGemma 2 2b It0.1

Same category · related evaluations