o3

par OpenAI · Sorti le 2025-04-16

51.3

score moyen

$2.00/1M

Prix d'entrée

$8.00/1M

Prix de sortie

200K tokens (~100 books)

Fenêtre de contexte

multimodal

Type

Tested on 21 benchmarks with 51.3% average. Top scores: MATH level 5 (97.8%), Fiction.LiveBench (88.9%), Lech Mazur Writing (83.9%).

Scores de benchmark

Benchmark	Catégorie	Score
MATH level 5	math	97.8
Fiction.LiveBench	knowledge	88.9
Lech Mazur Writing	knowledge	83.9
OTIS Mock AIME 2024-2025	math	83.9
Aider polyglot	coding	81.3
GPQA diamond	knowledge	75.8
CadEval	coding	74.0
GeoBench	knowledge	74.0
ARC-AGI	reasoning	60.8
SWE-Bench Verified (Bash Only)	coding	58.4
SimpleQA Verified	knowledge	53.0
WeirdML	coding	52.4
DeepResearch Bench	knowledge	46.6
SimpleBench	reasoning	43.7
VPCT	knowledge	28.0
OSWorld	agentic	23.0
FrontierMath-2025-02-28-Private	math	18.7
HLE	knowledge	16.3
GSO-Bench	coding	8.8
ARC-AGI-2	reasoning	6.5
FrontierMath-Tier-4-2025-07-01-Private	math	2.1