Compare · ModelsLive · 2 picked · head to head
Llama 3 8B Instruct vs Qwen2-72B
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Qwen2-72B wins on 8/9 benchmarks
Qwen2-72B wins 8 of 9 shared benchmarks. Leads in knowledge · general · language.
Category leads
knowledge·Qwen2-72Bgeneral·Qwen2-72Blanguage·Qwen2-72Bmath·Qwen2-72Breasoning·Llama 3 8B Instruct
Hype vs Reality
Attention vs performance
Llama 3 8B Instruct
#182 by perf·no signal
Qwen2-72B
#137 by perf·no signal
Best value
Llama 3 8B Instruct
Llama 3 8B Instruct
880.0 pts/$
$0.04/M
Qwen2-72B
—
no price
Vendor risk
Who is behind the model
Meta AI
$1.50T·Tier 1
Alibaba (Qwen)
$293.0B·Tier 1
Head to head
9 benchmarks · 2 models
Llama 3 8B InstructQwen2-72B
GPQA diamond
Qwen2-72B leads by +19.6
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
Llama 3 8B Instruct
1.4
Qwen2-72B
21.0
BBH (HuggingFace)
Qwen2-72B leads by +33.5
Llama 3 8B Instruct
18.4
Qwen2-72B
51.9
GPQA
Qwen2-72B leads by +17.1
Llama 3 8B Instruct
2.1
Qwen2-72B
19.2
IFEval
Qwen2-72B leads by +14.2
Llama 3 8B Instruct
24.0
Qwen2-72B
38.2
MATH Level 5
Qwen2-72B leads by +27.3
Llama 3 8B Instruct
3.9
Qwen2-72B
31.1
MMLU-PRO
Qwen2-72B leads by +34.8
Llama 3 8B Instruct
17.8
Qwen2-72B
52.6
MUSR
Llama 3 8B Instruct leads by +0.2
Llama 3 8B Instruct
19.9
Qwen2-72B
19.7
MATH level 5
Qwen2-72B leads by +32.9
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
Llama 3 8B Instruct
6.1
Qwen2-72B
39.1
MMLU
Qwen2-72B leads by +18.1
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
Llama 3 8B Instruct
58.4
Qwen2-72B
76.5
Full benchmark table
| Benchmark | Llama 3 8B Instruct | Qwen2-72B |
|---|---|---|
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs. | 1.4 | 21.0 |
BBH (HuggingFace) | 18.4 | 51.9 |
GPQA | 2.1 | 19.2 |
IFEval | 24.0 | 38.2 |
MATH Level 5 | 3.9 | 31.1 |
MMLU-PRO | 17.8 | 52.6 |
MUSR | 19.9 | 19.7 |
MATH level 5 MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics. | 6.1 | 39.1 |
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge. | 58.4 | 76.5 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.03 | $0.04 | 8K tokens (~4 books) | $0.33 | |
| — | — | — | — |