Compare · ModelsLive · 2 picked · head to head

StarCoder 2 15B vs Gemma 2B

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Gemma 2B wins on 6/10 benchmarks

Gemma 2B wins 6 of 10 shared benchmarks. Leads in knowledge · general · reasoning.

Category leads

knowledge·Gemma 2Bmath·StarCoder 2 15Bgeneral·Gemma 2Blanguage·StarCoder 2 15Breasoning·Gemma 2B

Hype vs Reality

Attention vs performance

StarCoder 2 15B

#204 by perf·no signal

QUIET

Gemma 2B

#189 by perf·no signal

QUIET

See full mindshare →

Best value

Pricing unknown

StarCoder 2 15B

—

no price

Gemma 2B

—

no price

Explore pricing →

Vendor risk

Who is behind the model

Unknown

private · undisclosed

Unknown

Google DeepMind

$4.00T·Tier 1

Low risk

See the AI economy →

Head to head

10 benchmarks · 2 models

StarCoder 2 15BGemma 2B

ARC AI2

StarCoder 2 15B leads by +6.8

AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.

StarCoder 2 15B

29.6

Gemma 2B

22.8

GSM8K

StarCoder 2 15B leads by +40.0

Grade School Math 8K · 8,500 linguistically diverse grade-school math word problems that require multi-step reasoning to solve.

StarCoder 2 15B

57.7

Gemma 2B

17.7

BBH (HuggingFace)

Gemma 2B leads by +0.8

StarCoder 2 15B

20.4

Gemma 2B

21.1

GPQA

Gemma 2B leads by +1.8

StarCoder 2 15B

3.1

Gemma 2B

4.9

IFEval

StarCoder 2 15B leads by +1.2

StarCoder 2 15B

27.8

Gemma 2B

26.6

MATH Level 5

Gemma 2B leads by +1.4

StarCoder 2 15B

6.0

Gemma 2B

7.4

MMLU-PRO

Gemma 2B leads by +6.6

StarCoder 2 15B

15.0

Gemma 2B

21.6

MUSR

Gemma 2B leads by +8.1

StarCoder 2 15B

2.9

Gemma 2B

11.0

MMLU

StarCoder 2 15B leads by +29.1

Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.

StarCoder 2 15B

52.1

Gemma 2B

23.1

Winogrande

Gemma 2B leads by +2.2

WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.

StarCoder 2 15B

28.6

Gemma 2B

30.8

Full benchmark table

Benchmark	StarCoder 2 15B	Gemma 2B
ARC AI2 AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.	29.6	22.8
GSM8K Grade School Math 8K · 8,500 linguistically diverse grade-school math word problems that require multi-step reasoning to solve.	57.7	17.7
BBH (HuggingFace)	20.4	21.1
GPQA	3.1	4.9
IFEval	27.8	26.6
MATH Level 5	6.0	7.4
MMLU-PRO	15.0	21.6
MUSR	2.9	11.0
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.	52.1	23.1
Winogrande WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.	28.6	30.8

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
U StarCoder 2 15B	—	—	—	—
Gemma 2B	—	—	—	—