Compare · ModelsLive · 2 picked · head to head

Kimi K2.5 vs GLM 5

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Kimi K2.5 wins on 11/18 benchmarks

Kimi K2.5 wins 11 of 18 shared benchmarks. Leads in reasoning · knowledge · math.

Category leads

reasoning·Kimi K2.5knowledge·Kimi K2.5math·Kimi K2.5language·Kimi K2.5coding·GLM 5

Hype vs Reality

Attention vs performance

Kimi K2.5

#85 by perf·no signal

QUIET

GLM 5

#53 by perf·#27 by attention

UNDERRATED

See full mindshare →

Best value

Kimi K2.5

1.3x better value than GLM 5

Kimi K2.5

49.5 pts/$

$1.05/M

GLM 5

38.1 pts/$

$1.51/M

Explore pricing →

Vendor risk

Who is behind the model

moonshotai

private · undisclosed

Unknown

z-ai

private · undisclosed

Unknown

See the AI economy →

Head to head

18 benchmarks · 2 models

Kimi K2.5GLM 5

ARC-AGI

Kimi K2.5 leads by +20.7

ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.

Kimi K2.5

65.3

GLM 5

44.7

ARC-AGI-2

Kimi K2.5 leads by +7.0

ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.

Kimi K2.5

11.8

GLM 5

4.9

Chess Puzzles

Kimi K2.5 leads by +2.0

Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.

Kimi K2.5

12.0

GLM 5

10.0

FrontierMath-2025-02-28-Private

Kimi K2.5 leads by +11.5

FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.

Kimi K2.5

27.9

GLM 5

16.4

FrontierMath-Tier-4-2025-07-01-Private

Kimi K2.5 leads by +2.1

FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning.

Kimi K2.5

4.2

GLM 5

2.1

GPQA diamond

GLM 5 leads by +0.3

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

Kimi K2.5

83.5

GLM 5

83.8

OpenCompass · AIME2025

GLM 5 leads by +3.9

Kimi K2.5

91.9

GLM 5

95.8

OpenCompass · GPQA-Diamond

Kimi K2.5 leads by +2.8

Kimi K2.5

88.1

GLM 5

85.3

OpenCompass · HLE

Kimi K2.5 leads by +0.5

Kimi K2.5

28.6

GLM 5

28.1

OpenCompass · IFEval

Kimi K2.5 leads by +0.7

Kimi K2.5

93.9

GLM 5

93.2

OpenCompass · LiveCodeBenchV6

GLM 5 leads by +5.6

Kimi K2.5

80.6

GLM 5

86.2

OpenCompass · MMLU-Pro

Kimi K2.5 leads by +1.0

Kimi K2.5

86.2

GLM 5

85.2

OTIS Mock AIME 2024-2025

Kimi K2.5 leads by +12.2

OTIS Mock AIME 2024–2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.

Kimi K2.5

92.2

GLM 5

80.0

PostTrainBench

GLM 5 leads by +3.6

Kimi K2.5

10.3

GLM 5

13.9

SimpleBench

GLM 5 leads by +7.7

SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.

Kimi K2.5

36.2

GLM 5

43.8

SWE-Bench verified

Kimi K2.5 leads by +1.7

Kimi K2.5

73.8

GLM 5

72.1

Terminal Bench

GLM 5 leads by +9.2

Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.

Kimi K2.5

43.2

GLM 5

52.4

WeirdML

GLM 5 leads by +2.6

WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.

Kimi K2.5

45.6

GLM 5

48.2

Full benchmark table

Benchmark	Kimi K2.5	GLM 5
ARC-AGI ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.	65.3	44.7
ARC-AGI-2 ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.	11.8	4.9
Chess Puzzles Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.	12.0	10.0
FrontierMath-2025-02-28-Private FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.	27.9	16.4
FrontierMath-Tier-4-2025-07-01-Private FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning.	4.2	2.1
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	83.5	83.8
OpenCompass · AIME2025	91.9	95.8
OpenCompass · GPQA-Diamond	88.1	85.3
OpenCompass · HLE	28.6	28.1
OpenCompass · IFEval	93.9	93.2
OpenCompass · LiveCodeBenchV6	80.6	86.2
OpenCompass · MMLU-Pro	86.2	85.2
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024–2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.	92.2	80.0
PostTrainBench	10.3	13.9
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.	36.2	43.8
SWE-Bench verified	73.8	72.1
Terminal Bench Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.	43.2	52.4
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.	45.6	48.2

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
Kimi K2.5	$0.38	$1.72	262K tokens (~131 books)	$7.17
GLM 5	$0.72	$2.30	80K tokens (~40 books)	$11.15

People also compared

Kimi K2 0711 vs Kimi K2.5 Kimi K2.5 vs Kimi K2 Thinking