Compare · ModelsLive · 2 picked · head to head

XGen-7B vs MPT-30B

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

MPT-30B wins 6 of 6 shared benchmarks. Leads in knowledge.

Category leads
knowledge·MPT-30B
Hype vs Reality
XGen-7B
#173 by perf·no signal
QUIET
MPT-30B
#182 by perf·no signal
QUIET
Best value
XGen-7B
no price
MPT-30B
no price
Vendor risk
U
Unknown
private · undisclosed
Unknown
U
Unknown
private · undisclosed
Unknown
Head to head
XGen-7BMPT-30B
ARC AI2
MPT-30B leads by +12.5
AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.
XGen-7B
21.6
MPT-30B
34.1
HellaSwag
MPT-30B leads by +2.9
HellaSwag · tests commonsense reasoning by asking models to predict the most plausible continuation of everyday scenarios.
XGen-7B
65.6
MPT-30B
68.5
MMLU
MPT-30B leads by +15.5
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
XGen-7B
15.1
MPT-30B
30.5
OpenBookQA
MPT-30B leads by +15.7
OpenBookQA · science questions that require combining a given core fact with broad common knowledge, mimicking an open-book exam setting.
XGen-7B
20.3
MPT-30B
36.0
PIQA
MPT-30B leads by +12.8
PIQA (Physical Interaction QA) · tests intuitive physical reasoning by asking models to select the correct approach for everyday physical tasks.
XGen-7B
51.0
MPT-30B
63.8
Winogrande
MPT-30B leads by +12.2
WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.
XGen-7B
29.8
MPT-30B
42.0
Full benchmark table
BenchmarkXGen-7BMPT-30B
ARC AI2
AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.
21.634.1
HellaSwag
HellaSwag · tests commonsense reasoning by asking models to predict the most plausible continuation of everyday scenarios.
65.668.5
MMLU
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
15.130.5
OpenBookQA
OpenBookQA · science questions that require combining a given core fact with broad common knowledge, mimicking an open-book exam setting.
20.336.0
PIQA
PIQA (Physical Interaction QA) · tests intuitive physical reasoning by asking models to select the correct approach for everyday physical tasks.
51.063.8
Winogrande
WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.
29.842.0
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
U
XGen-7B
U
MPT-30B