Compare · ModelsLive · 2 picked · head to head

Kimi K2 Thinking vs GPT-5 Mini

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Kimi K2 Thinking wins on 9/16 benchmarks

Kimi K2 Thinking wins 9 of 16 shared benchmarks. Leads in knowledge · coding · reasoning.

Category leads

math·GPT-5 Miniknowledge·Kimi K2 Thinkingcoding·Kimi K2 Thinkingreasoning·Kimi K2 Thinkinglanguage·GPT-5 Mini

Hype vs Reality

Attention vs performance

Kimi K2 Thinking

#79 by perf·no signal

QUIET

GPT-5 Mini

#65 by perf·no signal

QUIET

See full mindshare →

Best value

GPT-5 Mini

1.4x better value than Kimi K2 Thinking

Kimi K2 Thinking

34.4 pts/$

$1.55/M

GPT-5 Mini

49.8 pts/$

$1.13/M

Explore pricing →

Vendor risk

Who is behind the model

moonshotai

private · undisclosed

Unknown

OpenAI

$840.0B·Tier 1

Medium risk

See the AI economy →

Head to head

16 benchmarks · 2 models

Kimi K2 ThinkingGPT-5 Mini

FrontierMath-2025-02-28-Private

GPT-5 Mini leads by +5.8

FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.

Kimi K2 Thinking

21.4

GPT-5 Mini

27.2

FrontierMath-Tier-4-2025-07-01-Private

GPT-5 Mini leads by +6.2

FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning.

Kimi K2 Thinking

0.1

GPT-5 Mini

6.3

GPQA diamond

Kimi K2 Thinking leads by +12.3

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

Kimi K2 Thinking

79.0

GPT-5 Mini

66.7

LiveBench · Agentic Coding

Kimi K2 Thinking leads by +3.3

Kimi K2 Thinking

38.3

GPT-5 Mini

35.0

LiveBench · Coding

GPT-5 Mini leads by +8.6

Kimi K2 Thinking

67.4

GPT-5 Mini

76.1

LiveBench · Data Analysis

Kimi K2 Thinking leads by +2.7

Kimi K2 Thinking

52.3

GPT-5 Mini

49.6

LiveBench · If

GPT-5 Mini leads by +2.2

Kimi K2 Thinking

62.0

GPT-5 Mini

64.2

LiveBench · Language

GPT-5 Mini leads by +2.7

Kimi K2 Thinking

66.5

GPT-5 Mini

69.2

LiveBench · Mathematics

Kimi K2 Thinking leads by +6.7

Kimi K2 Thinking

81.1

GPT-5 Mini

74.4

LiveBench · Overall

Kimi K2 Thinking leads by +0.6

Kimi K2 Thinking

61.6

GPT-5 Mini

61.0

LiveBench · Reasoning

Kimi K2 Thinking leads by +4.8

Kimi K2 Thinking

63.5

GPT-5 Mini

58.6

OTIS Mock AIME 2024-2025

GPT-5 Mini leads by +3.6

OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.

Kimi K2 Thinking

83.0

GPT-5 Mini

86.7

SimpleQA Verified

Kimi K2 Thinking leads by +10.6

SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.

Kimi K2 Thinking

31.6

GPT-5 Mini

21.0

SWE-Bench Verified (Bash Only)

Kimi K2 Thinking leads by +3.6

SWE-Bench Verified (Bash Only) · a curated subset of SWE-bench where models fix real Python repository bugs using only bash commands, no agent frameworks.

Kimi K2 Thinking

63.4

GPT-5 Mini

59.8

Terminal Bench

Kimi K2 Thinking leads by +0.9

Terminal-Bench 2.0 · evaluates AI agents on real terminal-based coding tasks · writing scripts, debugging, running tests, and managing projects entirely through command-line interaction. Tests both code quality and terminal fluency. Claude Opus 4.7 scores 69.4%, demonstrating significant agentic terminal competence.

Kimi K2 Thinking

35.7

GPT-5 Mini

34.8

WeirdML

GPT-5 Mini leads by +9.9

WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.

Kimi K2 Thinking

42.8

GPT-5 Mini

52.7

Full benchmark table

Benchmark	Kimi K2 Thinking	GPT-5 Mini
FrontierMath-2025-02-28-Private FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.	21.4	27.2
FrontierMath-Tier-4-2025-07-01-Private FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning.	0.1	6.3
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	79.0	66.7
LiveBench · Agentic Coding	38.3	35.0
LiveBench · Coding	67.4	76.1
LiveBench · Data Analysis	52.3	49.6
LiveBench · If	62.0	64.2
LiveBench · Language	66.5	69.2
LiveBench · Mathematics	81.1	74.4
LiveBench · Overall	61.6	61.0
LiveBench · Reasoning	63.5	58.6
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.	83.0	86.7
SimpleQA Verified SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.	31.6	21.0
SWE-Bench Verified (Bash Only) SWE-Bench Verified (Bash Only) · a curated subset of SWE-bench where models fix real Python repository bugs using only bash commands, no agent frameworks.	63.4	59.8
Terminal Bench Terminal-Bench 2.0 · evaluates AI agents on real terminal-based coding tasks · writing scripts, debugging, running tests, and managing projects entirely through command-line interaction. Tests both code quality and terminal fluency. Claude Opus 4.7 scores 69.4%, demonstrating significant agentic terminal competence.	35.7	34.8
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.	42.8	52.7

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
Kimi K2 Thinking	$0.60	$2.50	262K tokens (~131 books)	$10.75
GPT-5 Mini	$0.25	$2.00	400K tokens (~200 books)	$6.88

People also compared

GPT-5 Mini vs Qwen3 235B A22B Claude Haiku 4.5 vs GPT-5 Mini Gemini 2.5 Flash vs GPT-5 Mini GPT-4.1 Mini vs GPT-5 Mini GPT-5.5 Pro vs Kimi K2 Thinking GPT-5.5 vs Kimi K2 Thinking Claude Mythos Preview vs Kimi K2 Thinking Kimi K2 Thinking vs Qwen3.5 397B A17B