Compare · ModelsLive · 3 picked · head to head
GPT-5.2-Codex vs GLM 5.1 vs Qwen3.6 Plus
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
GPT-5.2-Codex wins on 5/11 benchmarks
GPT-5.2-Codex wins 5 of 11 shared benchmarks. Leads in reasoning · math · knowledge.
Category leads
coding·GLM 5.1reasoning·GPT-5.2-Codexlanguage·GLM 5.1math·GPT-5.2-Codexknowledge·GPT-5.2-Codexspeed·GLM 5.1
Hype vs Reality
Attention vs performance
GPT-5.2-Codex
#15 by perf·no signal
GLM 5.1
#16 by perf·no signal
Qwen3.6 Plus
#14 by perf·no signal
Best value
Qwen3.6 Plus
2.0x better value than GLM 5.1
GPT-5.2-Codex
9.0 pts/$
$7.88/M
GLM 5.1
30.9 pts/$
$2.27/M
Qwen3.6 Plus
62.3 pts/$
$1.14/M
Vendor risk
Who is behind the model
OpenAI
$840.0B·Tier 1
z-ai
private · undisclosed
Alibaba (Qwen)
$293.0B·Tier 1
Head to head
11 benchmarks · 3 models
GPT-5.2-CodexGLM 5.1Qwen3.6 Plus
LiveBench · Agentic Coding
GPT-5.2-Codex
51.7
GLM 5.1
55.0
Qwen3.6 Plus
55.0
LiveBench · Coding
GPT-5.2-Codex leads by +5.4
GPT-5.2-Codex
83.6
GLM 5.1
75.4
Qwen3.6 Plus
78.2
LiveBench · Data Analysis
GPT-5.2-Codex leads by +8.3
GPT-5.2-Codex
78.2
GLM 5.1
63.2
Qwen3.6 Plus
69.9
LiveBench · If
GLM 5.1 leads by +2.0
GPT-5.2-Codex
66.5
GLM 5.1
68.5
Qwen3.6 Plus
58.3
LiveBench · Language
Qwen3.6 Plus leads by +1.3
GPT-5.2-Codex
73.7
GLM 5.1
71.8
Qwen3.6 Plus
75.0
LiveBench · Mathematics
GPT-5.2-Codex leads by +3.9
GPT-5.2-Codex
88.8
GLM 5.1
84.9
Qwen3.6 Plus
83.7
LiveBench · Overall
GPT-5.2-Codex leads by +3.5
GPT-5.2-Codex
74.3
GLM 5.1
70.2
Qwen3.6 Plus
70.8
LiveBench · Reasoning
GPT-5.2-Codex leads by +1.9
GPT-5.2-Codex
77.7
GLM 5.1
72.5
Qwen3.6 Plus
75.8
Artificial Analysis · Agentic Index
GLM 5.1 leads by +5.4
Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"
GLM 5.1
67.0
Qwen3.6 Plus
61.7
Artificial Analysis · Coding Index
GLM 5.1 leads by +0.5
Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads.
GLM 5.1
43.4
Qwen3.6 Plus
42.9
Artificial Analysis · Quality Index
GLM 5.1 leads by +1.4
GLM 5.1
51.4
Qwen3.6 Plus
50.0
Full benchmark table
| Benchmark | GPT-5.2-Codex | GLM 5.1 | Qwen3.6 Plus |
|---|---|---|---|
LiveBench · Agentic Coding | 51.7 | 55.0 | 55.0 |
LiveBench · Coding | 83.6 | 75.4 | 78.2 |
LiveBench · Data Analysis | 78.2 | 63.2 | 69.9 |
LiveBench · If | 66.5 | 68.5 | 58.3 |
LiveBench · Language | 73.7 | 71.8 | 75.0 |
LiveBench · Mathematics | 88.8 | 84.9 | 83.7 |
LiveBench · Overall | 74.3 | 70.2 | 70.8 |
LiveBench · Reasoning | 77.7 | 72.5 | 75.8 |
Artificial Analysis · Agentic Index Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?" | — | 67.0 | 61.7 |
Artificial Analysis · Coding Index Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads. | — | 43.4 | 42.9 |
Artificial Analysis · Quality Index | — | 51.4 | 50.0 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $1.75 | $14.00 | 400K tokens (~200 books) | $48.13 | |
| $1.05 | $3.50 | 203K tokens (~101 books) | $16.63 | |
| $0.33 | $1.95 | 1.0M tokens (~500 books) | $7.31 |