Compare · ModelsLive · 2 picked · head to head

Llama 3 8B Instruct vs Qwen2-72B

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Qwen2-72B wins on 8/9 benchmarks

Qwen2-72B wins 8 of 9 shared benchmarks. Leads in knowledge · general · language.

Category leads

knowledge·Qwen2-72Bgeneral·Qwen2-72Blanguage·Qwen2-72Bmath·Qwen2-72Breasoning·Llama 3 8B Instruct

Hype vs Reality

Attention vs performance

Llama 3 8B Instruct

#182 by perf·no signal

QUIET

Qwen2-72B

#137 by perf·no signal

QUIET

See full mindshare →

Best value

Llama 3 8B Instruct

880.0 pts/$

$0.04/M

Qwen2-72B

—

no price

Explore pricing →

Vendor risk

Who is behind the model

Meta AI

$1.50T·Tier 1

Low risk

Alibaba (Qwen)

$293.0B·Tier 1

Low risk

See the AI economy →

Head to head

9 benchmarks · 2 models

Llama 3 8B InstructQwen2-72B

GPQA diamond

Qwen2-72B leads by +19.6

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

Llama 3 8B Instruct

1.4

Qwen2-72B

21.0

BBH (HuggingFace)

Qwen2-72B leads by +33.5

Llama 3 8B Instruct

18.4

Qwen2-72B

51.9

GPQA

Qwen2-72B leads by +17.1

Llama 3 8B Instruct

2.1

Qwen2-72B

19.2

IFEval

Qwen2-72B leads by +14.2

Llama 3 8B Instruct

24.0

Qwen2-72B

38.2

MATH Level 5

Qwen2-72B leads by +27.3

Llama 3 8B Instruct

3.9

Qwen2-72B

31.1

MMLU-PRO

Qwen2-72B leads by +34.8

Llama 3 8B Instruct

17.8

Qwen2-72B

52.6

MUSR

Llama 3 8B Instruct leads by +0.2

Llama 3 8B Instruct

19.9

Qwen2-72B

19.7

MATH level 5

Qwen2-72B leads by +32.9

MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.

Llama 3 8B Instruct

6.1

Qwen2-72B

39.1

MMLU

Qwen2-72B leads by +18.1

Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.

Llama 3 8B Instruct

58.4

Qwen2-72B

76.5

Full benchmark table

Benchmark	Llama 3 8B Instruct	Qwen2-72B
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	1.4	21.0
BBH (HuggingFace)	18.4	51.9
GPQA	2.1	19.2
IFEval	24.0	38.2
MATH Level 5	3.9	31.1
MMLU-PRO	17.8	52.6
MUSR	19.9	19.7
MATH level 5 MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.	6.1	39.1
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.	58.4	76.5

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
Llama 3 8B Instruct	$0.03	$0.04	8K tokens (~4 books)	$0.33
Qwen2-72B	—	—	—	—