Compare · ModelsLive · 2 picked · head to head
Qwen2-72B vs Stable Beluga 2
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Qwen2-72B wins on 7/7 benchmarks
Qwen2-72B wins 7 of 7 shared benchmarks. Leads in general · knowledge · language.
Category leads
general·Qwen2-72Bknowledge·Qwen2-72Blanguage·Qwen2-72Bmath·Qwen2-72Breasoning·Qwen2-72B
Hype vs Reality
Attention vs performance
Qwen2-72B
#137 by perf·no signal
Stable Beluga 2
#100 by perf·no signal
Vendor risk
Who is behind the model
Alibaba (Qwen)
$293.0B·Tier 1
U
Unknown
private · undisclosed
Head to head
7 benchmarks · 2 models
Qwen2-72BStable Beluga 2
BBH (HuggingFace)
Qwen2-72B leads by +10.6
Qwen2-72B
51.9
Stable Beluga 2
41.3
GPQA
Qwen2-72B leads by +10.4
Qwen2-72B
19.2
Stable Beluga 2
8.8
IFEval
Qwen2-72B leads by +0.4
Qwen2-72B
38.2
Stable Beluga 2
37.9
MATH Level 5
Qwen2-72B leads by +26.7
Qwen2-72B
31.1
Stable Beluga 2
4.4
MMLU-PRO
Qwen2-72B leads by +26.7
Qwen2-72B
52.6
Stable Beluga 2
25.9
MUSR
Qwen2-72B leads by +1.1
Qwen2-72B
19.7
Stable Beluga 2
18.6
MMLU
Qwen2-72B leads by +18.4
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
Qwen2-72B
76.5
Stable Beluga 2
58.1
Full benchmark table
| Benchmark | Qwen2-72B | Stable Beluga 2 |
|---|---|---|
BBH (HuggingFace) | 51.9 | 41.3 |
GPQA | 19.2 | 8.8 |
IFEval | 38.2 | 37.9 |
MATH Level 5 | 31.1 | 4.4 |
MMLU-PRO | 52.6 | 25.9 |
MUSR | 19.7 | 18.6 |
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge. | 76.5 | 58.1 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| — | — | — | — | |
U Stable Beluga 2 | — | — | — | — |