Beta
Compare · ModelsLive · 2 picked · head to head

phi-3-medium 14B vs Phi 4

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Phi 4 wins 3 of 3 shared benchmarks. Leads in knowledge · math.

Category leads
knowledge·Phi 4math·Phi 4
Hype vs Reality
phi-3-medium 14B
#46 by perf·no signal
QUIET
Phi 4
#124 by perf·no signal
QUIET
Best value
phi-3-medium 14B
no price
Phi 4
421.5 pts/$
$0.10/M
Vendor risk
Microsoft logo
Microsoft
$3.00T·Big Tech
Low risk
Microsoft logo
Microsoft
$3.00T·Big Tech
Low risk
Head to head
phi-3-medium 14BPhi 4
GPQA diamond
Phi 4 leads by +38.0
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
phi-3-medium 14B
3.5
Phi 4
41.4
MATH level 5
Phi 4 leads by +47.4
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
phi-3-medium 14B
17.6
Phi 4
64.9
MMLU
Phi 4 leads by +9.1
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
phi-3-medium 14B
70.7
Phi 4
79.7
Full benchmark table
Benchmarkphi-3-medium 14BPhi 4
GPQA diamond
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
3.541.4
MATH level 5
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
17.664.9
MMLU
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
70.779.7
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
Microsoft logophi-3-medium 14B
Microsoft logoPhi 4$0.07$0.1416K tokens (~8 books)$0.84