Home/Models/Phi 4
Microsoft logo

Phi 4

by Microsoft · Released Jan 2025

Open Source
54.2
avg score
Rank #92
Compare
Better than 61% of all models
Context
16K tokens (~8 books)
Input $/1M
$0.07
Output $/1M
$0.14
Type
text
License
Open Source
Benchmarks
16 tested
Data updated today
About

[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion...

Tested on 16 benchmarks with 43.2% average. Top scores: Chatbot Arena Elo — Overall (1255.4%), MMLU (79.7%), IFEval (68.8%).

Capabilities
reasoning
10.1
#134 globally
math
42.9
#99 globally
knowledge
42.6
#136 globally
speed
12.0
#62 globally
general
55.3
#6 globally
language
68.8
#72 globally
Benchmark Scores
Compare All
Tested on 16 benchmarks · Ranked across 7 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
MUSR

HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.

10.1
MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

64.9
MATH Level 5

HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.

50.0
OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

13.7
MMLU

Massive Multitask Language Understanding. 57 subjects from STEM, humanities, and social sciences. The most widely-cited knowledge benchmark.

79.7
Lech Mazur Writing

Writing quality evaluation by Lech Mazur. Tests prose quality, coherence, and stylistic ability.

62.6
MMLU-PRO

HuggingFace MMLU-Pro. Harder version of MMLU with 10 answer choices instead of 4 and more challenging questions.

48.6
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Specifications
  • Typetext
  • Context16K tokens (~8 books)
  • ReleasedJan 2025
  • LicenseOpen Source
  • StatusActive
  • Cost / Message~$0.000
Available On
Microsoft logoMicrosoft$0.07
Share & Export
Tweet
Phi 4 is an open-source text AI model by Microsoft, released in January 2025. It has an average benchmark score of 54.2. Context window: 16K tokens.