Compare · ModelsLive · 2 picked · head to head
Qwen3 235B A22B Instruct 2507 vs Gemini 2.0 Flash
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Qwen3 235B A22B Instruct 2507 wins on 3/5 benchmarks
Qwen3 235B A22B Instruct 2507 wins 3 of 5 shared benchmarks. Leads in coding · arena.
Category leads
coding·Qwen3 235B A22B Instruct 2507reasoning·Gemini 2.0 Flasharena·Qwen3 235B A22B Instruct 2507knowledge·Gemini 2.0 Flash
Hype vs Reality
Attention vs performance
Qwen3 235B A22B Instruct 2507
#99 by perf·no signal
Gemini 2.0 Flash
#101 by perf·no signal
Best value
Qwen3 235B A22B Instruct 2507
3.0x better value than Gemini 2.0 Flash
Qwen3 235B A22B Instruct 2507
567.3 pts/$
$0.09/M
Gemini 2.0 Flash
192.0 pts/$
$0.25/M
Vendor risk
Who is behind the model
Alibaba (Qwen)
$293.0B·Tier 1
Google DeepMind
$4.00T·Tier 1
Head to head
5 benchmarks · 2 models
Qwen3 235B A22B Instruct 2507Gemini 2.0 Flash
Aider polyglot
Qwen3 235B A22B Instruct 2507 leads by +21.4
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
Qwen3 235B A22B Instruct 2507
59.6
Gemini 2.0 Flash
38.2
ARC-AGI-2
Gemini 2.0 Flash leads by +0.1
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.
Qwen3 235B A22B Instruct 2507
1.3
Gemini 2.0 Flash
1.3
Chatbot Arena Elo · Overall
Qwen3 235B A22B Instruct 2507 leads by +62.7
Qwen3 235B A22B Instruct 2507
1422.6
Gemini 2.0 Flash
1360.0
Fiction.LiveBench
Gemini 2.0 Flash leads by +8.2
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
Qwen3 235B A22B Instruct 2507
52.9
Gemini 2.0 Flash
61.1
WeirdML
Qwen3 235B A22B Instruct 2507 leads by +12.9
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
Qwen3 235B A22B Instruct 2507
38.7
Gemini 2.0 Flash
25.8
Full benchmark table
| Benchmark | Qwen3 235B A22B Instruct 2507 | Gemini 2.0 Flash |
|---|---|---|
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework. | 59.6 | 38.2 |
ARC-AGI-2 ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data. | 1.3 | 1.3 |
Chatbot Arena Elo · Overall | 1422.6 | 1360.0 |
Fiction.LiveBench Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination. | 52.9 | 61.1 |
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns. | 38.7 | 25.8 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.07 | $0.10 | 262K tokens (~131 books) | $0.78 | |
| $0.10 | $0.40 | 1.0M tokens (~500 books) | $1.75 |
People also compared