Compare · ModelsLive · 2 picked · head to head

DeepSeek V3.2 vs GLM 4.7

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

GLM 4.7 wins on 14/23 benchmarks

GLM 4.7 wins 14 of 23 shared benchmarks. Leads in arena · knowledge · reasoning.

Category leads

arena·GLM 4.7knowledge·GLM 4.7math·DeepSeek V3.2coding·DeepSeek V3.2reasoning·GLM 4.7language·GLM 4.7

Hype vs Reality

Attention vs performance

DeepSeek V3.2

#84 by perf·no signal

QUIET

GLM 4.7

#93 by perf·no signal

QUIET

See full mindshare →

Best value

DeepSeek V3.2

3.5x better value than GLM 4.7

DeepSeek V3.2

168.3 pts/$

$0.32/M

GLM 4.7

47.6 pts/$

$1.06/M

Explore pricing →

Vendor risk

Mixed exposure

One or more vendors flagged

DeepSeek

$3.4B·Tier 1

Higher risk

z-ai

private · undisclosed

Unknown

See the AI economy →

Head to head

23 benchmarks · 2 models

DeepSeek V3.2GLM 4.7

Chatbot Arena Elo · Coding

GLM 4.7 leads by +112.3

DeepSeek V3.2

1326.9

GLM 4.7

1439.2

Chatbot Arena Elo · Overall

GLM 4.7 leads by +18.3

DeepSeek V3.2

1424.4

GLM 4.7

1442.7

Chess Puzzles

DeepSeek V3.2 leads by +8.0

Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.

DeepSeek V3.2

14.0

GLM 4.7

6.0

FrontierMath-2025-02-28-Private

DeepSeek V3.2 leads by +19.7

FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.

DeepSeek V3.2

22.1

GLM 4.7

2.4

FrontierMath-Tier-4-2025-07-01-Private

DeepSeek V3.2 leads by +2.0

FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning.

DeepSeek V3.2

2.1

GLM 4.7

0.1

GPQA diamond

DeepSeek V3.2 leads by +0.1

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

DeepSeek V3.2

77.9

GLM 4.7

77.8

LiveBench · Agentic Coding

DeepSeek V3.2 leads by +5.0

DeepSeek V3.2

46.7

GLM 4.7

41.7

LiveBench · Coding

DeepSeek V3.2 leads by +2.6

DeepSeek V3.2

75.7

GLM 4.7

73.1

LiveBench · Data Analysis

GLM 4.7 leads by +10.1

DeepSeek V3.2

45.0

GLM 4.7

55.2

LiveBench · If

GLM 4.7 leads by +12.6

DeepSeek V3.2

23.1

GLM 4.7

35.7

LiveBench · Language

GLM 4.7 leads by +1.0

DeepSeek V3.2

64.2

GLM 4.7

65.2

LiveBench · Mathematics

GLM 4.7 leads by +12.1

DeepSeek V3.2

64.0

GLM 4.7

76.0

LiveBench · Overall

GLM 4.7 leads by +6.3

DeepSeek V3.2

51.8

GLM 4.7

58.1

LiveBench · Reasoning

GLM 4.7 leads by +15.5

DeepSeek V3.2

44.3

GLM 4.7

59.7

OpenCompass · AIME2025

GLM 4.7 leads by +2.4

DeepSeek V3.2

93.0

GLM 4.7

95.4

OpenCompass · GPQA-Diamond

GLM 4.7 leads by +2.3

DeepSeek V3.2

84.6

GLM 4.7

86.9

OpenCompass · HLE

GLM 4.7 leads by +2.2

DeepSeek V3.2

23.2

GLM 4.7

25.4

OpenCompass · IFEval

GLM 4.7 leads by +0.5

DeepSeek V3.2

89.7

GLM 4.7

90.2

OpenCompass · LiveCodeBenchV6

GLM 4.7 leads by +8.4

DeepSeek V3.2

75.4

GLM 4.7

83.8

OpenCompass · MMLU-Pro

DeepSeek V3.2 leads by +1.8

DeepSeek V3.2

85.8

GLM 4.7

84.0

OTIS Mock AIME 2024-2025

DeepSeek V3.2 leads by +4.5

OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.

DeepSeek V3.2

87.8

GLM 4.7

83.3

SimpleQA Verified

GLM 4.7 leads by +4.0

SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.

DeepSeek V3.2

27.5

GLM 4.7

31.5

Terminal Bench

DeepSeek V3.2 leads by +6.2

Terminal-Bench 2.0 · evaluates AI agents on real terminal-based coding tasks · writing scripts, debugging, running tests, and managing projects entirely through command-line interaction. Tests both code quality and terminal fluency. Claude Opus 4.7 scores 69.4%, demonstrating significant agentic terminal competence.

DeepSeek V3.2

39.6

GLM 4.7

33.4

Full benchmark table

Benchmark	DeepSeek V3.2	GLM 4.7
Chatbot Arena Elo · Coding	1326.9	1439.2
Chatbot Arena Elo · Overall	1424.4	1442.7
Chess Puzzles Chess Puzzles · tests strategic and tactical reasoning by having models solve chess puzzle positions, evaluating lookahead and pattern recognition abilities.	14.0	6.0
FrontierMath-2025-02-28-Private FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.	22.1	2.4
FrontierMath-Tier-4-2025-07-01-Private FrontierMath Tier 4 (Jul 2025) · the most challenging tier of frontier mathematics, containing problems that push the absolute limits of AI mathematical reasoning.	2.1	0.1
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	77.9	77.8
LiveBench · Agentic Coding	46.7	41.7
LiveBench · Coding	75.7	73.1
LiveBench · Data Analysis	45.0	55.2
LiveBench · If	23.1	35.7
LiveBench · Language	64.2	65.2
LiveBench · Mathematics	64.0	76.0
LiveBench · Overall	51.8	58.1
LiveBench · Reasoning	44.3	59.7
OpenCompass · AIME2025	93.0	95.4
OpenCompass · GPQA-Diamond	84.6	86.9
OpenCompass · HLE	23.2	25.4
OpenCompass · IFEval	89.7	90.2
OpenCompass · LiveCodeBenchV6	75.4	83.8
OpenCompass · MMLU-Pro	85.8	84.0
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.	87.8	83.3
SimpleQA Verified SimpleQA Verified · short factual questions with verified answers, measuring factual accuracy and the tendency to hallucinate or provide incorrect information.	27.5	31.5
Terminal Bench Terminal-Bench 2.0 · evaluates AI agents on real terminal-based coding tasks · writing scripts, debugging, running tests, and managing projects entirely through command-line interaction. Tests both code quality and terminal fluency. Claude Opus 4.7 scores 69.4%, demonstrating significant agentic terminal competence.	39.6	33.4

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
DeepSeek V3.2	$0.25	$0.38	131K tokens (~66 books)	$2.83
GLM 4.7	$0.38	$1.74	203K tokens (~101 books)	$7.20