Compare · ModelsLive · 2 picked · head to head

Kimi K2 0711 vs Kimi K2.5

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Kimi K2.5 wins on 4/4 benchmarks

Kimi K2.5 wins 4 of 4 shared benchmarks. Leads in knowledge · reasoning · coding.

Category leads

knowledge·Kimi K2.5reasoning·Kimi K2.5coding·Kimi K2.5

Hype vs Reality

Attention vs performance

Kimi K2 0711

#63 by perf·no signal

QUIET

Kimi K2.5

#87 by perf·no signal

QUIET

See full mindshare →

Best value

Kimi K2.5

1.1x better value than Kimi K2 0711

Kimi K2 0711

39.2 pts/$

$1.43/M

Kimi K2.5

42.6 pts/$

$1.22/M

Explore pricing →

Vendor risk

Who is behind the model

moonshotai

private · undisclosed

Unknown

moonshotai

private · undisclosed

Unknown

See the AI economy →

Head to head

4 benchmarks · 2 models

Kimi K2 0711Kimi K2.5

Fiction.LiveBench

Kimi K2.5 leads by +25.0

Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.

Kimi K2 0711

61.1

Kimi K2.5

86.1

SimpleBench

Kimi K2.5 leads by +24.6

SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.

Kimi K2 0711

11.6

Kimi K2.5

36.2

Terminal Bench

Kimi K2.5 leads by +15.4

Terminal-Bench 2.0 · evaluates AI agents on real terminal-based coding tasks · writing scripts, debugging, running tests, and managing projects entirely through command-line interaction. Tests both code quality and terminal fluency. Claude Opus 4.7 scores 69.4%, demonstrating significant agentic terminal competence.

Kimi K2 0711

27.8

Kimi K2.5

43.2

WeirdML

Kimi K2.5 leads by +6.2

WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.

Kimi K2 0711

39.4

Kimi K2.5

45.6

Full benchmark table

Benchmark	Kimi K2 0711	Kimi K2.5
Fiction.LiveBench Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.	61.1	86.1
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.	11.6	36.2
Terminal Bench Terminal-Bench 2.0 · evaluates AI agents on real terminal-based coding tasks · writing scripts, debugging, running tests, and managing projects entirely through command-line interaction. Tests both code quality and terminal fluency. Claude Opus 4.7 scores 69.4%, demonstrating significant agentic terminal competence.	27.8	43.2
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.	39.4	45.6

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
Kimi K2 0711	$0.57	$2.30	131K tokens (~66 books)	$10.03
Kimi K2.5	$0.44	$2.00	262K tokens (~131 books)	$8.30

People also compared

GPT-5.5 Pro vs Kimi K2 0711 GPT-5.5 vs Kimi K2 0711 Claude Mythos Preview vs Kimi K2 0711 Kimi K2 0711 vs Qwen3.5 397B A17B DeepSeek V3.2 Speciale vs Kimi K2 0711 Claude Instant vs Kimi K2 0711 DeepSeek-V2 (MoE-236B, May 2024) vs Kimi K2 0711 Kimi K2 0711 vs Qwen3.6 Plus