How much does phi-3-medium 14B cost?

phi-3-medium 14B is open source and can be self-hosted.

What benchmarks has phi-3-medium 14B been tested on?

phi-3-medium 14B has been evaluated on 10 benchmarks. Top scores: ARC AI2: 88.8, OpenBookQA: 83.2, HellaSwag: 76.5.

Is phi-3-medium 14B open source?

Yes, phi-3-medium 14B is open source.

How does phi-3-medium 14B compare to Claude Opus 4.5?

phi-3-medium 14B has an average score of 69.0 while Claude Opus 4.5 scores 69.2. Claude Opus 4.5 slightly outperforms phi-3-medium 14B overall. See full comparison →

Home/Models/phi-3-medium 14B

phi-3-medium 14B

Name: phi-3-medium 14B
Author: Microsoft

by Microsoft · Released Jan 2024

Open Source

69.0

avg score

Rank #41

Compare

Better than 82% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Open Source

Benchmarks

10 tested

Data updated today

About

Tested on 10 benchmarks with 58.6% average. Top scores: ARC AI2 (88.8%), OpenBookQA (83.2%), HellaSwag (76.5%).

Capabilities

reasoning

75.2

#16 globally

math

17.6

#174 globally

knowledge

61.7

#38 globally

Benchmark Scores

Compare All

Tested on 10 benchmarks · Ranked across 3 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

reasoningCompare reasoning →

BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

75.2—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

17.6—

knowledgeCompare knowledge →

ARC AI2

AI2 Reasoning Challenge. Grade-school science questions requiring multi-step reasoning. Easy and Challenge sets test different difficulty levels.

88.8—

OpenBookQA

Elementary science questions with access to a small book of core science facts. Tests reasoning beyond memorization.

83.2—

HellaSwag

Sentence completion requiring commonsense reasoning about physical and social situations. Tests real-world understanding.

76.5—

Quick compare:

vs Claude Opus 4.5

vs GPT-4

vs GLM 5

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · Microsoft Phi 3

Phi 3 Mini 4k InstructApr 2024

27.6

N/AN/Actx7 benchmarks

Phi 3.5 Mini InstructAug 2024

28.2+0.6

N/AN/Actx6 benchmarks

Phi 3.5 Vision InstructAug 2024

N/AN/Actx

phi-3-medium 14BJan 2024

58.6+58.6

N/AN/Actx10 benchmarks

phi-3-mini 3.8BJan 2024

61.0+2.4

N/AN/Actx8 benchmarks

phi-3-small 7.4BJan 2024

67.4+6.4

N/AN/Actx8 benchmarks

See the full Phi 3 family →

Similar Models

Links

Info

Microsoft Pricing explorer Developers · API

Research

Documentation

Community

Source Code

BenchGecko API

phi-3-medium-14b

Specifications

Typetext
ContextN/A
ReleasedJan 2024
LicenseOpen Source
Statusbenchmark-only

Available On

MicrosoftTBD

Frequently Asked Questions

phi-3-medium 14B is an open-source text AI model by Microsoft, released in January 2024. It has an average benchmark score of 69.0.

Benchmarks

ARC AI2 OpenBookQA HellaSwag BBH TriviaQA

Microsoft · Provider Microsoft · Economy All Models Compare Models Pricing Developers · API

phi-3-medium 14B

Frequently Asked Questions

Related Models

Benchmarks

Related Pages