Compare · ModelsLive · 2 picked · head to head

GPT-4.5 vs Mistral Large 2407

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

GPT-4.5 wins 5 of 5 shared benchmarks. Leads in knowledge · math · reasoning.

Category leads
knowledge·GPT-4.5math·GPT-4.5reasoning·GPT-4.5
Hype vs Reality
GPT-4.5
#166 by perf·no signal
QUIET
Mistral Large 2407
#147 by perf·no signal
QUIET
Best value
GPT-4.5
no price
Mistral Large 2407
9.8 pts/$
$4.00/M
Vendor risk
OpenAI logo
OpenAI
$840.0B·Tier 1
Medium risk
Mistral AI logo
Mistral AI
$14.0B·Tier 1
Medium risk
Head to head
GPT-4.5Mistral Large 2407
GPQA diamond
GPT-4.5 leads by +26.2
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
GPT-4.5
58.3
Mistral Large 2407
32.0
Lech Mazur Writing
GPT-4.5 leads by +6.6
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
GPT-4.5
75.6
Mistral Large 2407
69.0
MATH level 5
GPT-4.5 leads by +33.8
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
GPT-4.5
78.6
Mistral Large 2407
44.8
OTIS Mock AIME 2024-2025
GPT-4.5 leads by +29.3
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
GPT-4.5
37.7
Mistral Large 2407
8.4
SimpleBench
GPT-4.5 leads by +14.4
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
GPT-4.5
21.4
Mistral Large 2407
7.0
Full benchmark table
BenchmarkGPT-4.5Mistral Large 2407
GPQA diamond
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
58.332.0
Lech Mazur Writing
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
75.669.0
MATH level 5
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
78.644.8
OTIS Mock AIME 2024-2025
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
37.78.4
SimpleBench
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
21.47.0
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
OpenAI logoGPT-4.5
Mistral AI logoMistral Large 2407$2.00$6.00131K tokens (~66 books)$30.00