Compare · ModelsLive · 2 picked · head to head

Grok 3 Mini vs Qwen3 235B A22B Instruct 2507

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Grok 3 Mini wins on 3/5 benchmarks

Grok 3 Mini wins 3 of 5 shared benchmarks. Leads in reasoning · knowledge.

Grok 3 Mini

3 / 5

Qwen3 235B A22B Instruct 2507

2 / 5

Category leads

coding·Qwen3 235B A22B Instruct 2507reasoning·Grok 3 Miniknowledge·Grok 3 Mini

Hype vs Reality

Attention vs performance

Grok 3 Mini

#110 by perf·no signal

QUIET

Qwen3 235B A22B Instruct 2507

#99 by perf·no signal

QUIET

See full mindshare →

Best value

Qwen3 235B A22B Instruct 2507

4.9x better value than Grok 3 Mini

Grok 3 Mini

116.5 pts/$

$0.40/M

Qwen3 235B A22B Instruct 2507

567.3 pts/$

$0.09/M

Explore pricing →

Vendor risk

Who is behind the model

xAI

$250.0B·Tier 1

Medium risk

Alibaba (Qwen)

$293.0B·Tier 1

Low risk

See the AI economy →

Head to head

5 benchmarks · 2 models

Grok 3 MiniQwen3 235B A22B Instruct 2507

Aider polyglot

Qwen3 235B A22B Instruct 2507 leads by +10.3

Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.

Grok 3 Mini

49.3

Qwen3 235B A22B Instruct 2507

59.6

ARC-AGI

Grok 3 Mini leads by +5.5

ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.

Grok 3 Mini

16.5

Qwen3 235B A22B Instruct 2507

11.0

ARC-AGI-2

Qwen3 235B A22B Instruct 2507 leads by +0.8

ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.

Grok 3 Mini

0.4

Qwen3 235B A22B Instruct 2507

1.3

Fiction.LiveBench

Grok 3 Mini leads by +13.8

Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.

Grok 3 Mini

66.7

Qwen3 235B A22B Instruct 2507

52.9

WeirdML

Grok 3 Mini leads by +3.9

WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.

Grok 3 Mini

42.6

Qwen3 235B A22B Instruct 2507

38.7

Full benchmark table

Benchmark	Grok 3 Mini	Qwen3 235B A22B Instruct 2507
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.	49.3	59.6
ARC-AGI ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.	16.5	11.0
ARC-AGI-2 ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.	0.4	1.3
Fiction.LiveBench Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.	66.7	52.9
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.	42.6	38.7

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
Grok 3 Mini	$0.30	$0.50	131K tokens (~66 books)	$3.50
Qwen3 235B A22B Instruct 2507	$0.07	$0.10	262K tokens (~131 books)	$0.78

People also compared

Grok 3 Mini vs Grok 4 Grok 3 Mini vs Grok 4 Fast