Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...
Tested on 16 benchmarks with 30.8% average. Top scores: Chatbot Arena Elo — Overall (1222.2%), ARC AI2 (77.1%), OpenBookQA (76.8%).
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
AI2 Reasoning Challenge. Grade-school science questions requiring multi-step reasoning. Easy and Challenge sets test different difficulty levels.
Elementary science questions with access to a small book of core science facts. Tests reasoning beyond memorization.
Trivia questions sourced from trivia enthusiasts and quiz websites. Tests breadth of general knowledge.
- Typetext
- Context8K tokens (~4 books)
- ReleasedApr 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.000