Microsoft text generation model. 1759K downloads on HuggingFace.
Tested on 14 benchmarks with 30.2% average. Top scores: ARC AI2 (67.9%), OpenBookQA (64.8%), BBH (45.9%).
BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
AI2 Reasoning Challenge. Grade-school science questions requiring multi-step reasoning. Easy and Challenge sets test different difficulty levels.
Elementary science questions with access to a small book of core science facts. Tests reasoning beyond memorization.
Trivia questions sourced from trivia enthusiasts and quiz websites. Tests breadth of general knowledge.
- Typetext-generation
- ContextN/A
- ReleasedDec 2023
- LicenseOpen Source
- StatusActive