Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of...
Tested on 11 benchmarks with 32.9% average. Top scores: Chatbot Arena Elo — Overall (1287.6%), IFEval (79.8%), MMLU (67.6%).
Qwen3 32B scores 51.7 (100% as good) at $0.08/1M input · 88% cheaper
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Massive Multitask Language Understanding. 57 subjects from STEM, humanities, and social sciences. The most widely-cited knowledge benchmark.
HuggingFace MMLU-Pro. Harder version of MMLU with 10 answer choices instead of 4 and more challenging questions.
HuggingFace evaluation of GPQA (Graduate-Level Google-Proof Q&A). PhD-level science questions that cannot be easily searched.
- Typetext
- Context8K tokens (~4 books)
- ReleasedJul 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.002