Beta
Compare · ModelsLive · 2 picked · head to head

Llama 3 8B Instruct vs Qwen2-72B

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Qwen2-72B wins 8 of 9 shared benchmarks. Leads in knowledge · general · language.

Category leads
knowledge·Qwen2-72Bgeneral·Qwen2-72Blanguage·Qwen2-72Bmath·Qwen2-72Breasoning·Llama 3 8B Instruct
Hype vs Reality
Llama 3 8B Instruct
#182 by perf·no signal
QUIET
Qwen2-72B
#137 by perf·no signal
QUIET
Best value
Llama 3 8B Instruct
880.0 pts/$
$0.04/M
Qwen2-72B
no price
Vendor risk
Meta logo
Meta AI
$1.50T·Tier 1
Low risk
Alibaba Qwen logo
Alibaba (Qwen)
$293.0B·Tier 1
Low risk
Head to head
Llama 3 8B InstructQwen2-72B
GPQA diamond
Qwen2-72B leads by +19.6
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
Llama 3 8B Instruct
1.4
Qwen2-72B
21.0
BBH (HuggingFace)
Qwen2-72B leads by +33.5
Llama 3 8B Instruct
18.4
Qwen2-72B
51.9
GPQA
Qwen2-72B leads by +17.1
Llama 3 8B Instruct
2.1
Qwen2-72B
19.2
IFEval
Qwen2-72B leads by +14.2
Llama 3 8B Instruct
24.0
Qwen2-72B
38.2
MATH Level 5
Qwen2-72B leads by +27.3
Llama 3 8B Instruct
3.9
Qwen2-72B
31.1
MMLU-PRO
Qwen2-72B leads by +34.8
Llama 3 8B Instruct
17.8
Qwen2-72B
52.6
MUSR
Llama 3 8B Instruct leads by +0.2
Llama 3 8B Instruct
19.9
Qwen2-72B
19.7
MATH level 5
Qwen2-72B leads by +32.9
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
Llama 3 8B Instruct
6.1
Qwen2-72B
39.1
MMLU
Qwen2-72B leads by +18.1
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
Llama 3 8B Instruct
58.4
Qwen2-72B
76.5
Full benchmark table
BenchmarkLlama 3 8B InstructQwen2-72B
GPQA diamond
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
1.421.0
BBH (HuggingFace)
18.451.9
GPQA
2.119.2
IFEval
24.038.2
MATH Level 5
3.931.1
MMLU-PRO
17.852.6
MUSR
19.919.7
MATH level 5
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
6.139.1
MMLU
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
58.476.5
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
Meta logoLlama 3 8B Instruct$0.03$0.048K tokens (~4 books)$0.33
Alibaba Qwen logoQwen2-72B