Compare · ModelsLive · 2 picked · head to head

MPT-30B vs XGen-7B

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

MPT-30B wins 6 of 6 shared benchmarks. Leads in knowledge.

Category leads
knowledge·MPT-30B
Hype vs Reality
MPT-30B
#182 by perf·no signal
QUIET
XGen-7B
#173 by perf·no signal
QUIET
Best value
MPT-30B
no price
XGen-7B
no price
Vendor risk
U
Unknown
private · undisclosed
Unknown
U
Unknown
private · undisclosed
Unknown
Head to head
MPT-30BXGen-7B
ARC AI2
MPT-30B leads by +12.5
AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.
MPT-30B
34.1
XGen-7B
21.6
HellaSwag
MPT-30B leads by +2.9
HellaSwag · tests commonsense reasoning by asking models to predict the most plausible continuation of everyday scenarios.
MPT-30B
68.5
XGen-7B
65.6
MMLU
MPT-30B leads by +15.5
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
MPT-30B
30.5
XGen-7B
15.1
OpenBookQA
MPT-30B leads by +15.7
OpenBookQA · science questions that require combining a given core fact with broad common knowledge, mimicking an open-book exam setting.
MPT-30B
36.0
XGen-7B
20.3
PIQA
MPT-30B leads by +12.8
PIQA (Physical Interaction QA) · tests intuitive physical reasoning by asking models to select the correct approach for everyday physical tasks.
MPT-30B
63.8
XGen-7B
51.0
Winogrande
MPT-30B leads by +12.2
WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.
MPT-30B
42.0
XGen-7B
29.8
Full benchmark table
BenchmarkMPT-30BXGen-7B
ARC AI2
AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.
34.121.6
HellaSwag
HellaSwag · tests commonsense reasoning by asking models to predict the most plausible continuation of everyday scenarios.
68.565.6
MMLU
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
30.515.1
OpenBookQA
OpenBookQA · science questions that require combining a given core fact with broad common knowledge, mimicking an open-book exam setting.
36.020.3
PIQA
PIQA (Physical Interaction QA) · tests intuitive physical reasoning by asking models to select the correct approach for everyday physical tasks.
63.851.0
Winogrande
WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.
42.029.8
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
U
MPT-30B
U
XGen-7B