How much does Phi 4 Mini Instruct cost?

Phi 4 Mini Instruct is open source and can be self-hosted.

What benchmarks has Phi 4 Mini Instruct been tested on?

Phi 4 Mini Instruct has been evaluated on 9 benchmarks. Top scores: IFEval: 73.8, BBH (HuggingFace): 38.7, MMLU-PRO: 32.6.

Is Phi 4 Mini Instruct open source?

Yes, Phi 4 Mini Instruct is open source.

How does Phi 4 Mini Instruct compare to Mistral 7B V0.1?

Phi 4 Mini Instruct has an average score of 48.9 while Mistral 7B V0.1 scores 48.9. Mistral 7B V0.1 slightly outperforms Phi 4 Mini Instruct overall. See full comparison →

Home/Models/Phi 4 Mini Instruct

Phi 4 Mini Instruct

Name: Phi 4 Mini Instruct
Author: Microsoft

by Microsoft · Released Feb 2025

Open Source

48.9

avg score

Rank #111

Compare

Better than 52% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text-generation

License

Open Source

Benchmarks

9 tested

Data updated today

About

Microsoft text generation model. 584K downloads on HuggingFace.

Tested on 9 benchmarks with 29.4% average. Top scores: IFEval (73.8%), BBH (HuggingFace) (38.7%), MMLU-PRO (32.6%).

Capabilities

reasoning

6.5

#156 globally

math

17.0

#177 globally

knowledge

20.3

#197 globally

language

73.8

#62 globally

speed

8.2

#67 globally

general

38.7

#19 globally

Benchmark Scores

Compare All

Tested on 9 benchmarks · Ranked across 6 categories

Score Distribution (all 231 models)

0255075100

▲ You are here

reasoningCompare reasoning →

MUSR

HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.

6.5—

mathCompare math →

MATH Level 5

HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.

17.0—

knowledgeCompare knowledge →

MMLU-PRO

HuggingFace MMLU-Pro. Harder version of MMLU with 10 answer choices instead of 4 and more challenging questions.

32.6—

GPQA

HuggingFace evaluation of GPQA (Graduate-Level Google-Proof Q&A). PhD-level science questions that cannot be easily searched.

7.9—

Quick compare:

vs Mistral 7B V0.1

vs Claude 3.7 Sonnet

vs Qwen3 4B Thinking 2507

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · Microsoft Phi 4

Phi 4Jan 2025

43.2

$0.07/M in16Kctx16 benchmarks

Phi 4 Mini InstructFeb 2025

29.4-13.8

N/AN/Actx9 benchmarks

Phi 4 Multimodal InstructFeb 2025

N/AN/Actx1 benchmark

See the full Phi 4 family →

Similar Models

Qwen3 4B Thinking 2507

Alibaba

48.4TBD

Links

Info

Research

Documentation

Community

Source Code

BenchGecko API

microsoft-phi-4-mini-instruct

Specifications

Typetext-generation
ContextN/A
ReleasedFeb 2025
LicenseOpen Source
StatusActive

Available On

MicrosoftTBD

Frequently Asked Questions

Phi 4 Mini Instruct is an open-source text-generation AI model by Microsoft, released in February 2025. It has an average benchmark score of 48.9.

Benchmarks

IFEval BBH (HuggingFace)MMLU-PRO MATH Level 5 Artificial Analysis — Quality Index

Microsoft · Provider All Models Compare Models

Phi 4 Mini Instruct

Frequently Asked Questions

Related Models

Benchmarks

Related Pages