How much does Phi 2 cost?

Phi 2 is open source and can be self-hosted.

What benchmarks has Phi 2 been tested on?

Phi 2 has been evaluated on 14 benchmarks. Top scores: ARC AI2: 67.9, OpenBookQA: 64.8, BBH: 45.9.

How does Phi 2 compare to Gemma 2B?

Phi 2 has an average score of 31.0 while Gemma 2B scores 30.8. Phi 2 outperforms Gemma 2B overall. See full comparison →

Home/Models/Phi 2

Phi 2

Name: Phi 2
Author: Microsoft

by Microsoft · Released Dec 2023

Open Source

31.0

avg score

Rank #173

Compare

Better than 25% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text-generation

License

Open Source

Benchmarks

14 tested

Data updated today

About

Microsoft text generation model. 1759K downloads on HuggingFace.

Tested on 14 benchmarks with 30.2% average. Top scores: ARC AI2 (67.9%), OpenBookQA (64.8%), BBH (45.9%).

Capabilities

reasoning

29.9

#81 globally

math

3.0

#200 globally

knowledge

33.9

#165 globally

language

27.4

#128 globally

general

28.0

#33 globally

Benchmark Scores

Compare All

Tested on 14 benchmarks · Ranked across 5 categories

Score Distribution (all 231 models)

0255075100

▲ You are here

reasoningCompare reasoning →

BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

45.9—

MUSR

HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.

13.8—

mathCompare math →

MATH Level 5

HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.

3.0—

knowledgeCompare knowledge →

ARC AI2

AI2 Reasoning Challenge. Grade-school science questions requiring multi-step reasoning. Easy and Challenge sets test different difficulty levels.

67.9—

OpenBookQA

Elementary science questions with access to a small book of core science facts. Tests reasoning beyond memorization.

64.8—

TriviaQA

Trivia questions sourced from trivia enthusiasts and quiz websites. Tests breadth of general knowledge.

45.2—

Quick compare:

vs Gemma 2B

vs Devstral 2 2512

vs StarCoder 2 15B

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Similar Models

Links

Info

Research

Documentation

Community

Source Code

BenchGecko API

microsoft-phi-2

Specifications

Typetext-generation
ContextN/A
ReleasedDec 2023
LicenseOpen Source
StatusActive

Available On

MicrosoftTBD

Frequently Asked Questions

Phi 2 is an open-source text-generation AI model by Microsoft, released in December 2023. It has an average benchmark score of 31.0.

Benchmarks

ARC AI2 OpenBookQA BBH TriviaQA MMLU

Microsoft · Provider All Models Compare Models

Phi 2

Frequently Asked Questions

Related Models

Benchmarks

Related Pages