Compare · ModelsLive · 2 picked · head to head
Kimi K2.5 vs GLM 4.7
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Kimi K2.5 wins on 13/16 benchmarks
Kimi K2.5 wins 13 of 16 shared benchmarks. Leads in agentic · knowledge · math.
Category leads
agentic·Kimi K2.5knowledge·Kimi K2.5math·Kimi K2.5language·Kimi K2.5coding·GLM 4.7reasoning·GLM 4.7
Hype vs Reality
Attention vs performance
Kimi K2.5
#85 by perf·no signal
GLM 4.7
#91 by perf·no signal
Best value
Kimi K2.5
1.0x better value than GLM 4.7
Kimi K2.5
49.5 pts/$
$1.05/M
GLM 4.7
47.2 pts/$
$1.07/M
Vendor risk
Who is behind the model
moonshotai
private · undisclosed
z-ai
private · undisclosed
Head to head
16 benchmarks · 2 models
Kimi K2.5GLM 4.7
APEX-Agents
Kimi K2.5 leads by +11.3
APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.
Kimi K2.5
14.4
GLM 4.7
3.1
Chess Puzzles
Kimi K2.5 leads by +6.0
Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.
Kimi K2.5
12.0
GLM 4.7
6.0
FrontierMath-2025-02-28-Private
Kimi K2.5 leads by +25.5
FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.
Kimi K2.5
27.9
GLM 4.7
2.4
FrontierMath-Tier-4-2025-07-01-Private
Kimi K2.5 leads by +4.1
FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning.
Kimi K2.5
4.2
GLM 4.7
0.1
GPQA diamond
Kimi K2.5 leads by +5.7
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
Kimi K2.5
83.5
GLM 4.7
77.8
OpenCompass · AIME2025
GLM 4.7 leads by +3.5
Kimi K2.5
91.9
GLM 4.7
95.4
OpenCompass · GPQA-Diamond
Kimi K2.5 leads by +1.2
Kimi K2.5
88.1
GLM 4.7
86.9
OpenCompass · HLE
Kimi K2.5 leads by +3.2
Kimi K2.5
28.6
GLM 4.7
25.4
OpenCompass · IFEval
Kimi K2.5 leads by +3.7
Kimi K2.5
93.9
GLM 4.7
90.2
OpenCompass · LiveCodeBenchV6
GLM 4.7 leads by +3.2
Kimi K2.5
80.6
GLM 4.7
83.8
OpenCompass · MMLU-Pro
Kimi K2.5 leads by +2.2
Kimi K2.5
86.2
GLM 4.7
84.0
OTIS Mock AIME 2024-2025
Kimi K2.5 leads by +8.9
OTIS Mock AIME 2024–2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
Kimi K2.5
92.2
GLM 4.7
83.3
PostTrainBench
Kimi K2.5 leads by +2.8
Kimi K2.5
10.3
GLM 4.7
7.5
SimpleBench
GLM 4.7 leads by +1.1
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
Kimi K2.5
36.2
GLM 4.7
37.2
SimpleQA Verified
Kimi K2.5 leads by +2.4
SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.
Kimi K2.5
33.9
GLM 4.7
31.5
Terminal Bench
Kimi K2.5 leads by +9.8
Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.
Kimi K2.5
43.2
GLM 4.7
33.4
Full benchmark table
| Benchmark | Kimi K2.5 | GLM 4.7 |
|---|---|---|
APEX-Agents APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments. | 14.4 | 3.1 |
Chess Puzzles Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities. | 12.0 | 6.0 |
FrontierMath-2025-02-28-Private FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning. | 27.9 | 2.4 |
FrontierMath-Tier-4-2025-07-01-Private FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning. | 4.2 | 0.1 |
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs. | 83.5 | 77.8 |
OpenCompass · AIME2025 | 91.9 | 95.4 |
OpenCompass · GPQA-Diamond | 88.1 | 86.9 |
OpenCompass · HLE | 28.6 | 25.4 |
OpenCompass · IFEval | 93.9 | 90.2 |
OpenCompass · LiveCodeBenchV6 | 80.6 | 83.8 |
OpenCompass · MMLU-Pro | 86.2 | 84.0 |
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024–2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills. | 92.2 | 83.3 |
PostTrainBench | 10.3 | 7.5 |
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking. | 36.2 | 37.2 |
SimpleQA Verified SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information. | 33.9 | 31.5 |
Terminal Bench Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency. | 43.2 | 33.4 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
People also compared