Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class. Designed for a wide variety of...
Tested on 13 benchmarks with 36.0% average. Top scores: Chatbot Arena Elo — Overall (1265.0%), GSM8K (84.9%), IFEval (74.4%).
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
Physical Intuition QA. Tests understanding of everyday physical interactions and commonsense physics.
Massive Multitask Language Understanding. 57 subjects from STEM, humanities, and social sciences. The most widely-cited knowledge benchmark.
HuggingFace MMLU-Pro. Harder version of MMLU with 10 answer choices instead of 4 and more challenging questions.
- Typetext
- Context8K tokens (~4 books)
- ReleasedJun 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.000