Compare · ModelsLive · 2 picked · head to head

gpt-oss-120b vs Kimi K2 0711

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Kimi K2 0711 wins on 8/11 benchmarks

Kimi K2 0711 wins 8 of 11 shared benchmarks. Leads in coding · knowledge · language.

Category leads

coding·Kimi K2 0711knowledge·Kimi K2 0711language·Kimi K2 0711math·gpt-oss-120breasoning·Kimi K2 0711

Hype vs Reality

Attention vs performance

gpt-oss-120b

#106 by perf·no signal

QUIET

Kimi K2 0711

#61 by perf·no signal

QUIET

See full mindshare →

Best value

gpt-oss-120b

10.5x better value than Kimi K2 0711

gpt-oss-120b

409.6 pts/$

$0.11/M

Kimi K2 0711

39.2 pts/$

$1.43/M

Explore pricing →

Vendor risk

Who is behind the model

OpenAI

$840.0B·Tier 1

Medium risk

moonshotai

private · undisclosed

Unknown

See the AI economy →

Head to head

11 benchmarks · 2 models

gpt-oss-120bKimi K2 0711

Aider polyglot

Kimi K2 0711 leads by +17.3

Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.

gpt-oss-120b

41.8

Kimi K2 0711

59.1

Fiction.LiveBench

Kimi K2 0711 leads by +16.7

Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.

gpt-oss-120b

44.4

Kimi K2 0711

61.1

HELM · GPQA

gpt-oss-120b leads by +3.2

gpt-oss-120b

68.4

Kimi K2 0711

65.2

HELM · IFEval

Kimi K2 0711 leads by +1.4

gpt-oss-120b

83.6

Kimi K2 0711

85.0

HELM · MMLU-Pro

Kimi K2 0711 leads by +2.4

gpt-oss-120b

79.5

Kimi K2 0711

81.9

HELM · Omni-MATH

gpt-oss-120b leads by +3.4

gpt-oss-120b

68.8

Kimi K2 0711

65.4

HELM · WildBench

Kimi K2 0711 leads by +1.7

gpt-oss-120b

84.5

Kimi K2 0711

86.2

Lech Mazur Writing

Kimi K2 0711 leads by +9.7

Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.

gpt-oss-120b

77.3

Kimi K2 0711

86.9

SimpleBench

Kimi K2 0711 leads by +5.0

SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.

gpt-oss-120b

6.5

Kimi K2 0711

11.6

Terminal Bench

Kimi K2 0711 leads by +9.1

Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.

gpt-oss-120b

18.7

Kimi K2 0711

27.8

WeirdML

gpt-oss-120b leads by +8.8

WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.

gpt-oss-120b

48.2

Kimi K2 0711

39.4

Full benchmark table

Benchmark	gpt-oss-120b	Kimi K2 0711
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.	41.8	59.1
Fiction.LiveBench Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.	44.4	61.1
HELM · GPQA	68.4	65.2
HELM · IFEval	83.6	85.0
HELM · MMLU-Pro	79.5	81.9
HELM · Omni-MATH	68.8	65.4
HELM · WildBench	84.5	86.2
Lech Mazur Writing Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.	77.3	86.9
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.	6.5	11.6
Terminal Bench Terminal Bench · tests the ability to accomplish real-world tasks using terminal commands, evaluating shell scripting and CLI tool proficiency.	18.7	27.8
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.	48.2	39.4

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
gpt-oss-120b	$0.04	$0.19	131K tokens (~66 books)	$0.77
Kimi K2 0711	$0.57	$2.30	131K tokens (~66 books)	$10.03

People also compared

GPT-5 Chat vs Kimi K2 0711 Claude Mythos Preview vs Kimi K2 0711 Kimi K2 0711 vs Qwen3.5 397B A17B DeepSeek V3.2 Speciale vs Kimi K2 0711 Claude Instant vs Kimi K2 0711 DeepSeek-V2 (MoE-236B, May 2024) vs Kimi K2 0711 GPT-5.1-Codex-Max vs Kimi K2 0711 Kimi K2 0711 vs Qwen3.6 Plus