GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading...
Tested on 28 benchmarks with 57.6% average. Top scores: Chatbot Arena Elo — Overall (1455.6%), Chatbot Arena Elo — Coding (1441.0%), OpenCompass — AIME2025 (95.8%).
Gemma 4 31B scores 68.2 (98% as good) at $0.13/1M input · 78% cheaper
OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.
Regularly refreshed coding problems that avoid data contamination. New problems added monthly to prevent memorization.
Real-world software engineering tasks from GitHub issues. Models must diagnose bugs and write patches that pass test suites. Human-verified subset of SWE-bench.
Regularly refreshed reasoning problems testing logical deduction, spatial reasoning, and analytical thinking.
Fresh data analysis tasks testing ability to interpret tables, charts, and statistical data.
Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.
OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.
Regularly updated math problems that test numerical reasoning, algebra, calculus, and combinatorics.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
- Typetext
- Context203K tokens (~101 books)
- ReleasedFeb 2026
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.003