Compare · ModelsLive · 2 picked · head to head
GPT-4 (older v0314) vs GPT-4 Turbo
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
GPT-4 (older v0314) wins on 4/5 benchmarks
GPT-4 (older v0314) wins 4 of 5 shared benchmarks. Leads in knowledge · math.
Category leads
knowledge·GPT-4 (older v0314)math·GPT-4 (older v0314)
Hype vs Reality
Attention vs performance
GPT-4 (older v0314)
#72 by perf·no signal
GPT-4 Turbo
#90 by perf·no signal
Best value
GPT-4 Turbo
2.1x better value than GPT-4 (older v0314)
GPT-4 (older v0314)
1.2 pts/$
$45.00/M
GPT-4 Turbo
2.5 pts/$
$20.00/M
Vendor risk
Who is behind the model
OpenAI
$840.0B·Tier 1
OpenAI
$840.0B·Tier 1
Head to head
5 benchmarks · 2 models
GPT-4 (older v0314)GPT-4 Turbo
GPQA diamond
GPT-4 (older v0314) leads by +6.8
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
GPT-4 (older v0314)
14.3
GPT-4 Turbo
7.5
GSM8K
GPT-4 (older v0314) leads by +2.0
Grade School Math 8K · 8,500 linguistically diverse grade-school math word problems that require multi-step reasoning to solve.
GPT-4 (older v0314)
92.0
GPT-4 Turbo
90.0
MMLU
GPT-4 (older v0314) leads by +5.3
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
GPT-4 (older v0314)
81.9
GPT-4 Turbo
76.5
OTIS Mock AIME 2024-2025
GPT-4 Turbo leads by +0.6
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
GPT-4 (older v0314)
0.5
GPT-4 Turbo
1.0
Winogrande
WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.
GPT-4 (older v0314)
75.0
GPT-4 Turbo
75.0
Full benchmark table
| Benchmark | GPT-4 (older v0314) | GPT-4 Turbo |
|---|---|---|
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs. | 14.3 | 7.5 |
GSM8K Grade School Math 8K · 8,500 linguistically diverse grade-school math word problems that require multi-step reasoning to solve. | 92.0 | 90.0 |
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge. | 81.9 | 76.5 |
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills. | 0.5 | 1.0 |
Winogrande WinoGrande · large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs. | 75.0 | 75.0 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $30.00 | $60.00 | 8K tokens (~4 books) | $375.00 | |
| $10.00 | $30.00 | 128K tokens (~64 books) | $150.00 |