o4 Mini vs o3
Côte à côte. Chaque métrique. Chaque benchmark.
| Type | o4 Mini | o3 |
|---|---|---|
| Provider | ||
| score moyen | 53.2 | 55.2 |
| Prix d'entrée | $1.10 | $2.00 |
| Prix de sortie | $4.40 | $8.00 |
| Fenêtre de contexte | 200K tokens (~100 books) | 200K tokens (~100 books) |
| Sorti le | 2025-04-16 | 2025-04-16 |
| Code source ouvert | Proprietary | Proprietary |
Scores de benchmark
24 benchmarks · o4 Mini: 7, o3: 17
| Benchmark | Catégorie | o4 Mini | o3 |
|---|---|---|---|
| Aider polyglot | coding | 72.0 | 81.3 |
| ARC-AGI | reasoning | 58.7 | 60.8 |
| ARC-AGI-2 | reasoning | 6.1 | 6.5 |
| CadEval | coding | 62.0 | 74.0 |
| Fiction.LiveBench | knowledge | 77.8 | 88.9 |
| FrontierMath-2025-02-28-Private | math | 24.8 | 18.7 |
| FrontierMath-Tier-4-2025-07-01-Private | math | 6.3 | 2.1 |
| GeoBench | knowledge | 64.0 | 74.0 |
| GPQA diamond | knowledge | 72.8 | 75.8 |
| GSO-Bench | coding | 3.6 | 8.8 |
| HELM — GPQA | knowledge | 73.5 | 75.3 |
| HELM — IFEval | language | 92.9 | 86.9 |
| HELM — MMLU-Pro | knowledge | 82.0 | 85.9 |
| HELM — Omni-MATH | math | 72.0 | 71.4 |
| HELM — WildBench | reasoning | 85.4 | 86.1 |
| HLE | knowledge | 13.9 | 16.3 |
| Lech Mazur Writing | knowledge | 75.0 | 83.9 |
| MATH level 5 | math | 97.8 | 97.8 |
| OTIS Mock AIME 2024-2025 | math | 81.7 | 83.9 |
| SimpleBench | reasoning | 26.4 | 43.7 |
| SimpleQA Verified | knowledge | 23.9 | 53.0 |
| SWE-Bench Verified (Bash Only) | coding | 45.0 | 58.4 |
| VPCT | knowledge | 36.3 | 28.0 |
| WeirdML | coding | 52.6 | 52.4 |