Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex...
Tested on 20 benchmarks with 50.8% average. Top scores: Chatbot Arena Elo — Overall (1425.8%), Chatbot Arena Elo — Coding (1353.7%), OpenCompass — AIME2025 (90.3%).
Phi 4 scores 54.2 (100% as good) at $0.07/1M input · 83% cheaper
OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.
Regularly refreshed coding problems that avoid data contamination. New problems added monthly to prevent memorization.
LiveBench coding tasks that require multi-step reasoning and tool use. Tests planning and execution of complex coding workflows.
Regularly refreshed reasoning problems testing logical deduction, spatial reasoning, and analytical thinking.
Fresh data analysis tasks testing ability to interpret tables, charts, and statistical data.
OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.
Regularly updated math problems that test numerical reasoning, algebra, calculus, and combinatorics.
Original research-level math problems created by professional mathematicians. Problems are unpublished and cannot be memorized.
- Typetext
- Context205K tokens (~102 books)
- ReleasedSep 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.003