Beta
Compare · ModelsLive · 2 picked · head to head

phi-3-mini 3.8B vs Qwen-14B

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

phi-3-mini 3.8B wins 3 of 3 shared benchmarks. Leads in knowledge · reasoning.

Category leads
knowledge·phi-3-mini 3.8Breasoning·phi-3-mini 3.8B
Hype vs Reality
phi-3-mini 3.8B
#34 by perf·no signal
QUIET
Qwen-14B
#35 by perf·no signal
QUIET
Best value
phi-3-mini 3.8B
no price
Qwen-14B
no price
Vendor risk
Microsoft logo
Microsoft
$3.00T·Big Tech
Low risk
Alibaba Qwen logo
Alibaba (Qwen)
$293.0B·Tier 1
Low risk
Head to head
phi-3-mini 3.8BQwen-14B
ARC AI2
phi-3-mini 3.8B leads by +0.7
AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.
phi-3-mini 3.8B
79.9
Qwen-14B
79.2
BBH
phi-3-mini 3.8B leads by +22.3
BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.
phi-3-mini 3.8B
62.3
Qwen-14B
40.0
MMLU
phi-3-mini 3.8B leads by +3.3
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
phi-3-mini 3.8B
58.4
Qwen-14B
55.1
Full benchmark table
Benchmarkphi-3-mini 3.8BQwen-14B
ARC AI2
AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.
79.979.2
BBH
BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.
62.340.0
MMLU
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
58.455.1
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
Microsoft logophi-3-mini 3.8B
Alibaba Qwen logoQwen-14B