Compare · ModelsLive · 2 picked · head to head

Kimi K2.5 vs GLM 4.7

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Kimi K2.5 wins on 13/16 benchmarks

Kimi K2.5 wins 13 of 16 shared benchmarks. Leads in agentic · knowledge · math.

Category leads

agentic·Kimi K2.5knowledge·Kimi K2.5math·Kimi K2.5language·Kimi K2.5coding·GLM 4.7reasoning·GLM 4.7

Hype vs Reality

Attention vs performance

Kimi K2.5

#85 by perf·no signal

QUIET

GLM 4.7

#91 by perf·no signal

QUIET

See full mindshare →

Best value

Kimi K2.5

1.0x better value than GLM 4.7

Kimi K2.5

49.5 pts/$

$1.05/M

GLM 4.7

47.2 pts/$

$1.07/M

Explore pricing →

Vendor risk

Who is behind the model

moonshotai

private · undisclosed

Unknown

z-ai

private · undisclosed

Unknown

See the AI economy →

Head to head

16 benchmarks · 2 models

Kimi K2.5GLM 4.7

APEX-Agents

Kimi K2.5 leads by +11.3

APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.

Kimi K2.5

14.4

GLM 4.7

3.1

Chess Puzzles

Kimi K2.5 leads by +6.0

Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.

Kimi K2.5

12.0

GLM 4.7

6.0

FrontierMath-2025-02-28-Private

Kimi K2.5 leads by +25.5

FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.

Kimi K2.5

27.9

GLM 4.7

2.4

FrontierMath-Tier-4-2025-07-01-Private

Kimi K2.5 leads by +4.1

FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning.

Kimi K2.5

4.2

GLM 4.7

0.1

GPQA diamond

Kimi K2.5 leads by +5.7

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

Kimi K2.5

83.5

GLM 4.7

77.8

OpenCompass · AIME2025

GLM 4.7 leads by +3.5

Kimi K2.5

91.9

GLM 4.7

95.4

OpenCompass · GPQA-Diamond

Kimi K2.5 leads by +1.2

Kimi K2.5

88.1

GLM 4.7

86.9

OpenCompass · HLE

Kimi K2.5 leads by +3.2

Kimi K2.5

28.6

GLM 4.7

25.4

OpenCompass · IFEval

Kimi K2.5 leads by +3.7

Kimi K2.5

93.9

GLM 4.7

90.2

OpenCompass · LiveCodeBenchV6

GLM 4.7 leads by +3.2

Kimi K2.5

80.6

GLM 4.7

83.8

OpenCompass · MMLU-Pro

Kimi K2.5 leads by +2.2

Kimi K2.5

86.2

GLM 4.7

84.0

OTIS Mock AIME 2024-2025

Kimi K2.5 leads by +8.9

OTIS Mock AIME 2024–2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.

Kimi K2.5

92.2

GLM 4.7

83.3

PostTrainBench

Kimi K2.5 leads by +2.8

Kimi K2.5

10.3

GLM 4.7

7.5

SimpleBench

GLM 4.7 leads by +1.1

SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.

Kimi K2.5

36.2

GLM 4.7

37.2

SimpleQA Verified

Kimi K2.5 leads by +2.4

SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.

Kimi K2.5

33.9

GLM 4.7

31.5

Terminal Bench

Kimi K2.5 leads by +9.8

Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.

Kimi K2.5

43.2

GLM 4.7

33.4

Full benchmark table

Benchmark	Kimi K2.5	GLM 4.7
APEX-Agents APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.	14.4	3.1
Chess Puzzles Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.	12.0	6.0
FrontierMath-2025-02-28-Private FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.	27.9	2.4
FrontierMath-Tier-4-2025-07-01-Private FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning.	4.2	0.1
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	83.5	77.8
OpenCompass · AIME2025	91.9	95.4
OpenCompass · GPQA-Diamond	88.1	86.9
OpenCompass · HLE	28.6	25.4
OpenCompass · IFEval	93.9	90.2
OpenCompass · LiveCodeBenchV6	80.6	83.8
OpenCompass · MMLU-Pro	86.2	84.0
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024–2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.	92.2	83.3
PostTrainBench	10.3	7.5
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.	36.2	37.2
SimpleQA Verified SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.	33.9	31.5
Terminal Bench Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.	43.2	33.4

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
Kimi K2.5	$0.38	$1.72	262K tokens (~131 books)	$7.17
GLM 4.7	$0.39	$1.75	203K tokens (~101 books)	$7.30

People also compared

Kimi K2 0711 vs Kimi K2.5 Kimi K2.5 vs Kimi K2 Thinking