Compare · ModelsLive · 3 picked · head to head

GPT-5.2-Codex vs GLM 5.1 vs Qwen3.6 Plus

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

GPT-5.2-Codex wins on 5/11 benchmarks

GPT-5.2-Codex wins 5 of 11 shared benchmarks. Leads in reasoning · math · knowledge.

Category leads

coding·GLM 5.1reasoning·GPT-5.2-Codexlanguage·GLM 5.1math·GPT-5.2-Codexknowledge·GPT-5.2-Codexspeed·GLM 5.1

Hype vs Reality

Attention vs performance

GPT-5.2-Codex

#15 by perf·no signal

QUIET

GLM 5.1

#16 by perf·no signal

QUIET

Qwen3.6 Plus

#14 by perf·no signal

QUIET

See full mindshare →

Best value

Qwen3.6 Plus

2.0x better value than GLM 5.1

GPT-5.2-Codex

9.0 pts/$

$7.88/M

GLM 5.1

30.9 pts/$

$2.27/M

Qwen3.6 Plus

62.3 pts/$

$1.14/M

Explore pricing →

Vendor risk

Who is behind the model

OpenAI

$840.0B·Tier 1

Medium risk

z-ai

private · undisclosed

Unknown

Alibaba (Qwen)

$293.0B·Tier 1

Low risk

See the AI economy →

Head to head

11 benchmarks · 3 models

GPT-5.2-CodexGLM 5.1Qwen3.6 Plus

LiveBench · Agentic Coding

GPT-5.2-Codex

51.7

GLM 5.1

55.0

Qwen3.6 Plus

55.0

LiveBench · Coding

GPT-5.2-Codex leads by +5.4

GPT-5.2-Codex

83.6

GLM 5.1

75.4

Qwen3.6 Plus

78.2

LiveBench · Data Analysis

GPT-5.2-Codex leads by +8.3

GPT-5.2-Codex

78.2

GLM 5.1

63.2

Qwen3.6 Plus

69.9

LiveBench · If

GLM 5.1 leads by +2.0

GPT-5.2-Codex

66.5

GLM 5.1

68.5

Qwen3.6 Plus

58.3

LiveBench · Language

Qwen3.6 Plus leads by +1.3

GPT-5.2-Codex

73.7

GLM 5.1

71.8

Qwen3.6 Plus

75.0

LiveBench · Mathematics

GPT-5.2-Codex leads by +3.9

GPT-5.2-Codex

88.8

GLM 5.1

84.9

Qwen3.6 Plus

83.7

LiveBench · Overall

GPT-5.2-Codex leads by +3.5

GPT-5.2-Codex

74.3

GLM 5.1

70.2

Qwen3.6 Plus

70.8

LiveBench · Reasoning

GPT-5.2-Codex leads by +1.9

GPT-5.2-Codex

77.7

GLM 5.1

72.5

Qwen3.6 Plus

75.8

Artificial Analysis · Agentic Index

GLM 5.1 leads by +5.4

Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"

GLM 5.1

67.0

Qwen3.6 Plus

61.7

Artificial Analysis · Coding Index

GLM 5.1 leads by +0.5

Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads.

GLM 5.1

43.4

Qwen3.6 Plus

42.9

Artificial Analysis · Quality Index

GLM 5.1 leads by +1.4

GLM 5.1

51.4

Qwen3.6 Plus

50.0

Full benchmark table

Benchmark	GPT-5.2-Codex	GLM 5.1	Qwen3.6 Plus
LiveBench · Agentic Coding	51.7	55.0	55.0
LiveBench · Coding	83.6	75.4	78.2
LiveBench · Data Analysis	78.2	63.2	69.9
LiveBench · If	66.5	68.5	58.3
LiveBench · Language	73.7	71.8	75.0
LiveBench · Mathematics	88.8	84.9	83.7
LiveBench · Overall	74.3	70.2	70.8
LiveBench · Reasoning	77.7	72.5	75.8
Artificial Analysis · Agentic Index Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"	—	67.0	61.7
Artificial Analysis · Coding Index Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads.	—	43.4	42.9
Artificial Analysis · Quality Index	—	51.4	50.0

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
GPT-5.2-Codex	$1.75	$14.00	400K tokens (~200 books)	$48.13
GLM 5.1	$1.05	$3.50	203K tokens (~101 books)	$16.63
Qwen3.6 Plus	$0.33	$1.95	1.0M tokens (~500 books)	$7.31

People also compared

GPT-5.5 Pro vs Qwen3.6 Plus GPT-5.2-Codex vs GPT-5.5 Pro GLM 5.1 vs GPT-5.5 Pro GPT-5.5 vs Qwen3.6 Plus GPT-5.2-Codex vs GPT-5.5 GLM 5.1 vs GPT-5.5 GPT-5 Chat vs Qwen3.6 Plus Claude Mythos Preview vs Qwen3.6 Plus