Better than 8% of all models
Context
N/A
Input $/1M
TBD
Output $/1M
TBD
Type
text
License
Open Source
Benchmarks
11 tested
Data updated today
About
Tested on 11 benchmarks with 16.3% average. Top scores: Winogrande (46.8%), HellaSwag (30.1%), ARC AI2 (25.9%).
Capabilities
reasoning
3.4
#170 globally
math
1.8
#207 globally
knowledge
20.8
#198 globally
general
7.5
#57 globally
language
20.3
#140 globally
Benchmark Scores
Compare AllTested on 11 benchmarks · Ranked across 5 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
reasoningCompare reasoning →
MUSR
3.4—HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
mathCompare math →
MATH Level 5
1.8—HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
knowledgeCompare knowledge →
Winogrande
46.8—Commonsense coreference resolution. Tests understanding of pronoun references in ambiguous sentences.
HellaSwag
30.1—Sentence completion requiring commonsense reasoning about physical and social situations. Tests real-world understanding.
ARC AI2
25.9—AI2 Reasoning Challenge. Grade-school science questions requiring multi-step reasoning. Easy and Challenge sets test different difficulty levels.
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Research
Documentation
Community
Source Code
BenchGecko API
phi-1-5
Specifications
- Typetext
- ContextN/A
- ReleasedJan 2024
- LicenseOpen Source
- Statusbenchmark-only
Available On
Learn More
Share & Export
Frequently Asked Questions
Phi-1.5 is an open-source text AI model by Microsoft, released in January 2024. It has an average benchmark score of 15.6.