Compare · ModelsLive · 2 picked · head to head

Nemotron-4 15B vs Qwen2.5 Coder 7B Instruct

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Qwen2.5 Coder 7B Instruct wins on 3/5 benchmarks

Qwen2.5 Coder 7B Instruct wins 3 of 5 shared benchmarks. Leads in knowledge · math.

Nemotron-4 15B

2 / 5

Qwen2.5 Coder 7B Instruct

3 / 5

Category leads

knowledge·Qwen2.5 Coder 7B Instructmath·Qwen2.5 Coder 7B Instruct

Hype vs Reality

Attention vs performance

Nemotron-4 15B

#78 by perf·no signal

QUIET

Qwen2.5 Coder 7B Instruct

#120 by perf·no signal

QUIET

See full mindshare →

Best value

Qwen2.5 Coder 7B Instruct

Nemotron-4 15B

—

no price

Qwen2.5 Coder 7B Instruct

740.0 pts/$

$0.06/M

Explore pricing →

Vendor risk

Who is behind the model

Unknown

private · undisclosed

Unknown

Alibaba (Qwen)

$293.0B·Tier 1

Low risk

See the AI economy →

Head to head

5 benchmarks · 2 models

Nemotron-4 15BQwen2.5 Coder 7B Instruct

ARC AI2

Qwen2.5 Coder 7B Instruct leads by +7.2

AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.

Nemotron-4 15B

40.7

Qwen2.5 Coder 7B Instruct

47.9

GSM8K

Qwen2.5 Coder 7B Instruct leads by +40.7

Grade School Math 8K · 8,500 linguistically diverse grade-school math word problems that require multi-step reasoning to solve.

Nemotron-4 15B

46.0

Qwen2.5 Coder 7B Instruct

86.7

HellaSwag

Nemotron-4 15B leads by +7.5

HellaSwag · tests commonsense reasoning by asking models to predict the most plausible continuation of everyday scenarios.

Nemotron-4 15B

76.5

Qwen2.5 Coder 7B Instruct

69.1

MMLU

Qwen2.5 Coder 7B Instruct leads by +12.4

Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.

Nemotron-4 15B

44.9

Qwen2.5 Coder 7B Instruct

57.3

Winogrande

Nemotron-4 15B leads by +10.2

WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.

Nemotron-4 15B

56.0

Qwen2.5 Coder 7B Instruct

45.8

Full benchmark table

Benchmark	Nemotron-4 15B	Qwen2.5 Coder 7B Instruct
ARC AI2 AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.	40.7	47.9
GSM8K Grade School Math 8K · 8,500 linguistically diverse grade-school math word problems that require multi-step reasoning to solve.	46.0	86.7
HellaSwag HellaSwag · tests commonsense reasoning by asking models to predict the most plausible continuation of everyday scenarios.	76.5	69.1
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.	44.9	57.3
Winogrande WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.	56.0	45.8

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
U Nemotron-4 15B	—	—	—	—
Qwen2.5 Coder 7B Instruct	$0.03	$0.09	33K tokens (~16 books)	$0.45