Compare · ModelsLive · 2 picked · head to head

GLM 4.7 vs Kimi K2 Thinking

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Kimi K2 Thinking wins on 13/23 benchmarks

Kimi K2 Thinking wins 13 of 23 shared benchmarks. Leads in agentic · knowledge · language.

Category leads

agentic·Kimi K2 Thinkingknowledge·Kimi K2 Thinkingmath·GLM 4.7coding·GLM 4.7reasoning·GLM 4.7language·Kimi K2 Thinking

Hype vs Reality

Attention vs performance

GLM 4.7

#91 by perf·no signal

QUIET

Kimi K2 Thinking

#77 by perf·no signal

QUIET

See full mindshare →

Best value

GLM 4.7

1.4x better value than Kimi K2 Thinking

GLM 4.7

47.2 pts/$

$1.07/M

Kimi K2 Thinking

34.4 pts/$

$1.55/M

Explore pricing →

Vendor risk

Who is behind the model

z-ai

private · undisclosed

Unknown

moonshotai

private · undisclosed

Unknown

See the AI economy →

Head to head

23 benchmarks · 2 models

GLM 4.7Kimi K2 Thinking

APEX-Agents

Kimi K2 Thinking leads by +0.9

APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.

GLM 4.7

3.1

Kimi K2 Thinking

4.0

Chess Puzzles

Kimi K2 Thinking leads by +14.0

Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.

GLM 4.7

6.0

Kimi K2 Thinking

20.0

FrontierMath-2025-02-28-Private

Kimi K2 Thinking leads by +19.0

FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.

GLM 4.7

2.4

Kimi K2 Thinking

21.4

FrontierMath-Tier-4-2025-07-01-Private

FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning.

GLM 4.7

0.1

Kimi K2 Thinking

0.1

GPQA diamond

Kimi K2 Thinking leads by +1.2

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

GLM 4.7

77.8

Kimi K2 Thinking

79.0

LiveBench · Agentic Coding

GLM 4.7 leads by +3.3

GLM 4.7

41.7

Kimi K2 Thinking

38.3

LiveBench · Coding

GLM 4.7 leads by +5.7

GLM 4.7

73.1

Kimi K2 Thinking

67.4

LiveBench · Data Analysis

GLM 4.7 leads by +2.9

GLM 4.7

55.2

Kimi K2 Thinking

52.3

LiveBench · If

Kimi K2 Thinking leads by +26.4

GLM 4.7

35.7

Kimi K2 Thinking

62.0

LiveBench · Language

Kimi K2 Thinking leads by +1.2

GLM 4.7

65.2

Kimi K2 Thinking

66.5

LiveBench · Mathematics

Kimi K2 Thinking leads by +5.1

GLM 4.7

76.0

Kimi K2 Thinking

81.1

LiveBench · Overall

Kimi K2 Thinking leads by +3.5

GLM 4.7

58.1

Kimi K2 Thinking

61.6

LiveBench · Reasoning

Kimi K2 Thinking leads by +3.8

GLM 4.7

59.7

Kimi K2 Thinking

63.5

OpenCompass · AIME2025

GLM 4.7 leads by +1.3

GLM 4.7

95.4

Kimi K2 Thinking

94.1

OpenCompass · GPQA-Diamond

GLM 4.7 leads by +4.2

GLM 4.7

86.9

Kimi K2 Thinking

82.7

OpenCompass · HLE

GLM 4.7 leads by +4.1

GLM 4.7

25.4

Kimi K2 Thinking

21.3

OpenCompass · IFEval

Kimi K2 Thinking leads by +2.2

GLM 4.7

90.2

Kimi K2 Thinking

92.4

OpenCompass · LiveCodeBenchV6

GLM 4.7 leads by +6.7

GLM 4.7

83.8

Kimi K2 Thinking

77.1

OpenCompass · MMLU-Pro

Kimi K2 Thinking leads by +0.3

GLM 4.7

84.0

Kimi K2 Thinking

84.3

OTIS Mock AIME 2024-2025

GLM 4.7 leads by +0.3

OTIS Mock AIME 2024–2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.

GLM 4.7

83.3

Kimi K2 Thinking

83.0

PostTrainBench

GLM 4.7 leads by +0.2

GLM 4.7

7.5

Kimi K2 Thinking

7.3

SimpleQA Verified

Kimi K2 Thinking leads by +0.1

SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.

GLM 4.7

31.5

Kimi K2 Thinking

31.6

Terminal Bench

Kimi K2 Thinking leads by +2.3

Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.

GLM 4.7

33.4

Kimi K2 Thinking

35.7

Full benchmark table

Benchmark	GLM 4.7	Kimi K2 Thinking
APEX-Agents APEX-Agents · evaluates AI agents on complex, multi-step tasks requiring planning, tool use, and autonomous decision-making in realistic environments.	3.1	4.0
Chess Puzzles Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.	6.0	20.0
FrontierMath-2025-02-28-Private FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.	2.4	21.4
FrontierMath-Tier-4-2025-07-01-Private FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning.	0.1	0.1
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	77.8	79.0
LiveBench · Agentic Coding	41.7	38.3
LiveBench · Coding	73.1	67.4
LiveBench · Data Analysis	55.2	52.3
LiveBench · If	35.7	62.0
LiveBench · Language	65.2	66.5
LiveBench · Mathematics	76.0	81.1
LiveBench · Overall	58.1	61.6
LiveBench · Reasoning	59.7	63.5
OpenCompass · AIME2025	95.4	94.1
OpenCompass · GPQA-Diamond	86.9	82.7
OpenCompass · HLE	25.4	21.3
OpenCompass · IFEval	90.2	92.4
OpenCompass · LiveCodeBenchV6	83.8	77.1
OpenCompass · MMLU-Pro	84.0	84.3
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024–2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.	83.3	83.0
PostTrainBench	7.5	7.3
SimpleQA Verified SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.	31.5	31.6
Terminal Bench Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.	33.4	35.7

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
GLM 4.7	$0.39	$1.75	203K tokens (~101 books)	$7.30
Kimi K2 Thinking	$0.60	$2.50	262K tokens (~131 books)	$10.75

People also compared

GPT-5 Chat vs Kimi K2 Thinking Claude Mythos Preview vs Kimi K2 Thinking Kimi K2 Thinking vs Qwen3.5 397B A17B DeepSeek V3.2 Speciale vs Kimi K2 Thinking Claude Instant vs Kimi K2 Thinking DeepSeek-V2 (MoE-236B, May 2024) vs Kimi K2 Thinking GPT-5.1-Codex-Max vs Kimi K2 Thinking Kimi K2 Thinking vs Qwen3.6 Plus