GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...
Tested on 12 benchmarks with 70.2% average. Top scores: Chatbot Arena Elo — Overall (1467.4%), LiveBench — Mathematics (84.9%), LiveBench — Coding (75.4%).
Qwen3.6 Plus scores 88.7 (102% as good) at $0.33/1M input · 66% cheaper
Regularly refreshed coding problems that avoid data contamination. New problems added monthly to prevent memorization.
LiveBench coding tasks that require multi-step reasoning and tool use. Tests planning and execution of complex coding workflows.
Regularly refreshed reasoning problems testing logical deduction, spatial reasoning, and analytical thinking.
Fresh data analysis tasks testing ability to interpret tables, charts, and statistical data.
Regularly updated math problems that test numerical reasoning, algebra, calculus, and combinatorics.
- Typetext
- Context203K tokens (~101 books)
- ReleasedApr 2026
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.005