Compare · ModelsLive · 2 picked · head to head
phi-3-mini 3.8B vs Qwen-14B
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
phi-3-mini 3.8B wins on 3/3 benchmarks
phi-3-mini 3.8B wins 3 of 3 shared benchmarks. Leads in knowledge · reasoning.
Category leads
knowledge·phi-3-mini 3.8Breasoning·phi-3-mini 3.8B
Hype vs Reality
Attention vs performance
phi-3-mini 3.8B
#34 by perf·no signal
Qwen-14B
#35 by perf·no signal
Vendor risk
Who is behind the model
Microsoft
$3.00T·Big Tech
Alibaba (Qwen)
$293.0B·Tier 1
Head to head
3 benchmarks · 2 models
phi-3-mini 3.8BQwen-14B
ARC AI2
phi-3-mini 3.8B leads by +0.7
AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.
phi-3-mini 3.8B
79.9
Qwen-14B
79.2
BBH
phi-3-mini 3.8B leads by +22.3
BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.
phi-3-mini 3.8B
62.3
Qwen-14B
40.0
MMLU
phi-3-mini 3.8B leads by +3.3
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
phi-3-mini 3.8B
58.4
Qwen-14B
55.1
Full benchmark table
| Benchmark | phi-3-mini 3.8B | Qwen-14B |
|---|---|---|
ARC AI2 AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval. | 79.9 | 79.2 |
BBH BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans. | 62.3 | 40.0 |
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge. | 58.4 | 55.1 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| — | — | — | — | |
| — | — | — | — |
People also compared