Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
Tested on 14 benchmarks with 42.2% average. Top scores: Chatbot Arena Elo — Overall (1365.1%), OpenCompass — IFEval (81.0%), Lech Mazur Writing (79.9%).
OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.
Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Writing quality evaluation by Lech Mazur. Tests prose quality, coherence, and stylistic ability.
OpenCompass MMLU-Pro evaluation. Harder knowledge test with more answer choices.
Geography benchmark testing knowledge of world geography, landmarks, borders, and geopolitical facts.
- Typemultimodal
- Context131K tokens (~66 books)
- ReleasedMar 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.000