GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while...
Tested on 26 benchmarks with 50.5% average. Top scores: Chatbot Arena Elo — Overall (1442.7%), Chatbot Arena Elo — Coding (1439.2%), OpenCompass — AIME2025 (95.4%).
OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.
Regularly refreshed coding problems that avoid data contamination. New problems added monthly to prevent memorization.
LiveBench coding tasks that require multi-step reasoning and tool use. Tests planning and execution of complex coding workflows.
Regularly refreshed reasoning problems testing logical deduction, spatial reasoning, and analytical thinking.
Fresh data analysis tasks testing ability to interpret tables, charts, and statistical data.
Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.
OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Regularly updated math problems that test numerical reasoning, algebra, calculus, and combinatorics.
- Typetext
- Context203K tokens (~101 books)
- ReleasedDec 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.003