Compare · ModelsLive · 2 picked · head to head

Qwen3 235B A22B Instruct 2507 vs Gemini 2.0 Flash

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Qwen3 235B A22B Instruct 2507 wins on 3/5 benchmarks

Qwen3 235B A22B Instruct 2507 wins 3 of 5 shared benchmarks. Leads in coding · arena.

Qwen3 235B A22B Instruct 2507

3 / 5

Gemini 2.0 Flash

2 / 5

Category leads

coding·Qwen3 235B A22B Instruct 2507reasoning·Gemini 2.0 Flasharena·Qwen3 235B A22B Instruct 2507knowledge·Gemini 2.0 Flash

Hype vs Reality

Attention vs performance

Qwen3 235B A22B Instruct 2507

#99 by perf·no signal

QUIET

Gemini 2.0 Flash

#101 by perf·no signal

QUIET

See full mindshare →

Best value

Qwen3 235B A22B Instruct 2507

3.0x better value than Gemini 2.0 Flash

Qwen3 235B A22B Instruct 2507

567.3 pts/$

$0.09/M

Gemini 2.0 Flash

192.0 pts/$

$0.25/M

Explore pricing →

Vendor risk

Who is behind the model

Alibaba (Qwen)

$293.0B·Tier 1

Low risk

Google DeepMind

$4.00T·Tier 1

Low risk

See the AI economy →

Head to head

5 benchmarks · 2 models

Qwen3 235B A22B Instruct 2507Gemini 2.0 Flash

Aider polyglot

Qwen3 235B A22B Instruct 2507 leads by +21.4

Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.

Qwen3 235B A22B Instruct 2507

59.6

Gemini 2.0 Flash

38.2

ARC-AGI-2

Gemini 2.0 Flash leads by +0.1

ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.

Qwen3 235B A22B Instruct 2507

1.3

Gemini 2.0 Flash

1.3

Chatbot Arena Elo · Overall

Qwen3 235B A22B Instruct 2507 leads by +62.7

Qwen3 235B A22B Instruct 2507

1422.6

Gemini 2.0 Flash

1360.0

Fiction.LiveBench

Gemini 2.0 Flash leads by +8.2

Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.

Qwen3 235B A22B Instruct 2507

52.9

Gemini 2.0 Flash

61.1

WeirdML

Qwen3 235B A22B Instruct 2507 leads by +12.9

WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.

Qwen3 235B A22B Instruct 2507

38.7

Gemini 2.0 Flash

25.8

Full benchmark table

Benchmark	Qwen3 235B A22B Instruct 2507	Gemini 2.0 Flash
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.	59.6	38.2
ARC-AGI-2 ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.	1.3	1.3
Chatbot Arena Elo · Overall	1422.6	1360.0
Fiction.LiveBench Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.	52.9	61.1
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.	38.7	25.8

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
Qwen3 235B A22B Instruct 2507	$0.07	$0.10	262K tokens (~131 books)	$0.78
Gemini 2.0 Flash	$0.10	$0.40	1.0M tokens (~500 books)	$1.75

People also compared

Gemini 2.0 Flash vs GPT-4o-mini