o3

Name: o3
Price: 2 USD
Author: OpenAI

von OpenAI · Veroeffentlicht 2025-04-16

55.2

Durchschn. Score

$2.00/1M

Eingabepreis

$8.00/1M

Ausgabepreis

200K tokens (~100 books)

Kontextfenster

multimodal

Typ

Tested on 33 benchmarks with 55.2% average. Top scores: MATH level 5 (97.8%), Fiction.LiveBench (88.9%), HELM — IFEval (86.9%).

Benchmark-Ergebnisse

Benchmark	Kategorie	Score
MATH level 5	math	97.8
Fiction.LiveBench	knowledge	88.9
HELM — IFEval	language	86.9
HELM — WildBench	reasoning	86.1
HELM — MMLU-Pro	knowledge	85.9
Lech Mazur Writing	knowledge	83.9
OTIS Mock AIME 2024-2025	math	83.9
Aider polyglot	coding	81.3
GPQA diamond	knowledge	75.8
HELM — GPQA	knowledge	75.3
CadEval	coding	74.0
GeoBench	knowledge	74.0
HELM — Omni-MATH	math	71.4
SWE-Bench verified	coding	62.3
ARC-AGI	reasoning	60.8
SWE-Bench Verified (Bash Only)	coding	58.4
SimpleQA Verified	knowledge	53.0
WeirdML	coding	52.4
Professional Reasoning — Legal	knowledge	48.6
Professional Reasoning — Finance	knowledge	47.7
DeepResearch Bench	knowledge	46.6
SimpleBench	reasoning	43.7
Artificial Analysis — Coding Index	speed	38.4
Artificial Analysis — Quality Index	speed	38.4
Artificial Analysis — Agentic Index	speed	36.1
VPCT	knowledge	28.0
OSWorld	agentic	23.0
FrontierMath-2025-02-28-Private	math	18.7
HLE	knowledge	16.3
EnigmaEval	knowledge	13.1
GSO-Bench	coding	8.8
ARC-AGI-2	reasoning	6.5
FrontierMath-Tier-4-2025-07-01-Private	math	2.1