Compare · ModelsLive · 2 picked · head to head

Qwen3 235B A22B Instruct 2507 vs Gemini 2.0 Flash

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Qwen3 235B A22B Instruct 2507 wins 3 of 5 shared benchmarks. Leads in coding · arena.

Category leads
coding·Qwen3 235B A22B Instruct 2507reasoning·Gemini 2.0 Flasharena·Qwen3 235B A22B Instruct 2507knowledge·Gemini 2.0 Flash
Hype vs Reality
Qwen3 235B A22B Instruct 2507
#99 by perf·no signal
QUIET
Gemini 2.0 Flash
#101 by perf·no signal
QUIET
Best value
3.0x better value than Gemini 2.0 Flash
Qwen3 235B A22B Instruct 2507
567.3 pts/$
$0.09/M
Gemini 2.0 Flash
192.0 pts/$
$0.25/M
Vendor risk
Alibaba Qwen logo
Alibaba (Qwen)
$293.0B·Tier 1
Low risk
Google DeepMind logo
Google DeepMind
$4.00T·Tier 1
Low risk
Head to head
Qwen3 235B A22B Instruct 2507Gemini 2.0 Flash
Aider polyglot
Qwen3 235B A22B Instruct 2507 leads by +21.4
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
Qwen3 235B A22B Instruct 2507
59.6
Gemini 2.0 Flash
38.2
ARC-AGI-2
Gemini 2.0 Flash leads by +0.1
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.
Qwen3 235B A22B Instruct 2507
1.3
Gemini 2.0 Flash
1.3
Chatbot Arena Elo · Overall
Qwen3 235B A22B Instruct 2507 leads by +62.7
Qwen3 235B A22B Instruct 2507
1422.6
Gemini 2.0 Flash
1360.0
Fiction.LiveBench
Gemini 2.0 Flash leads by +8.2
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
Qwen3 235B A22B Instruct 2507
52.9
Gemini 2.0 Flash
61.1
WeirdML
Qwen3 235B A22B Instruct 2507 leads by +12.9
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
Qwen3 235B A22B Instruct 2507
38.7
Gemini 2.0 Flash
25.8
Full benchmark table
BenchmarkQwen3 235B A22B Instruct 2507Gemini 2.0 Flash
Aider polyglot
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
59.638.2
ARC-AGI-2
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.
1.31.3
Chatbot Arena Elo · Overall
1422.61360.0
Fiction.LiveBench
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
52.961.1
WeirdML
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
38.725.8
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
Alibaba Qwen logoQwen3 235B A22B Instruct 2507$0.07$0.10262K tokens (~131 books)$0.78
Google DeepMind logoGemini 2.0 Flash$0.10$0.401.0M tokens (~500 books)$1.75