Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of...
Tested on 13 benchmarks with 40.1% average. Top scores: Chatbot Arena Elo — Overall (1288.7%), GSM8K (84.9%), IFEval (79.8%).
Qwen2.5 7B Instruct scores 57.4 (100% as good) at $0.04/1M input · 94% cheaper
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
Massive Multitask Language Understanding. 57 subjects from STEM, humanities, and social sciences. The most widely-cited knowledge benchmark.
Physical Intuition QA. Tests understanding of everyday physical interactions and commonsense physics.
HuggingFace MMLU-Pro. Harder version of MMLU with 10 answer choices instead of 4 and more challenging questions.
- Typetext
- Context8K tokens (~4 books)
- ReleasedJul 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.002