How much does o4 Mini cost?

o4 Mini costs $1.10 per million input tokens and $4.40 per million output tokens. For a typical conversation (~2,000 tokens), that's approximately $0.007 per message.

What benchmarks has o4 Mini been tested on?

o4 Mini has been evaluated on 26 benchmarks. Top scores: MATH level 5: 97.8, HELM — IFEval: 92.9, HELM — WildBench: 85.4.

Is o4 Mini open source?

No, o4 Mini is a proprietary model by OpenAI.

How does o4 Mini compare to GPT-5.1?

o4 Mini has an average score of 57.0 while GPT-5.1 scores 57.0. GPT-5.1 slightly outperforms o4 Mini overall. o4 Mini costs $1.10/1M input vs GPT-5.1 at $1.25/1M input. See full comparison →

Home/Models/o4 Mini

o4 Mini

Name: o4 Mini
Price: 1.1 USD
Author: OpenAI

by OpenAI · Released Apr 2025

Multimodal

57.0

avg score

Rank #83

Compare

Better than 64% of all models

Context

200K tokens (~100 books)

Input $/1M

$1.10

Output $/1M

$4.40

Type

multimodal

License

Proprietary

Benchmarks

26 tested

Data updated today

About

OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning...

Tested on 26 benchmarks with 53.2% average. Top scores: MATH level 5 (97.8%), HELM — IFEval (92.9%), HELM — WildBench (85.4%).

Looking for similar performance at lower cost?
Qwen2.5 7B Instruct scores 57.4 (101% as good) at $0.04/1M input · 96% cheaper

Capabilities

coding

47.0

#75 globally

reasoning

44.2

#59 globally

math

56.5

#57 globally

knowledge

54.3

#77 globally

language

92.9

#6 globally

Benchmark Scores

Compare All

Tested on 26 benchmarks · Ranked across 5 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

72.0—

CadEval

Computer-aided design evaluation. Tests understanding of CAD concepts, 3D modeling, and engineering design principles.

62.0—

WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

52.6—

reasoningCompare reasoning →

HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

85.4—

ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

58.7—

SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

26.4—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

97.8—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

81.7—

HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

72.0—

Quick compare:

vs GPT-5.1

vs Palmyra X5

vs Qwen2.5 Coder 14B Instruct

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · OpenAI o4

o4 MiniApr 2025

53.2

$1.10/M in200Kctx26 benchmarks

o4 Mini Deep ResearchOct 2025

$2.00/M in(+0.90)200Kctx

o4 Mini HighApr 2025

72.0+72.0

$1.10/M in(-0.90)200Kctx1 benchmark

See the full o4 family →

Similar Models

Qwen2.5 Coder 14B Instruct

Alibaba

57.0TBD

Links

Info

OpenAI Pricing explorer Developers · API

Research

Technical Report

Documentation

API Docs Playground

Community

@OpenAI

BenchGecko API

o4-mini

Specifications

Typemultimodal
Context200K tokens (~100 books)
ReleasedApr 2025
LicenseProprietary
StatusActive
Cost / Message~$0.007

Available On

OpenAI$1.10

Frequently Asked Questions

o4 Mini is a proprietary multimodal AI model by OpenAI, released in April 2025. It has an average benchmark score of 57.0. Context window: 200K tokens.

Benchmarks

MATH level 5 HELM — IFEval HELM — WildBench HELM — MMLU-Pro OTIS Mock AIME 2024-2025

OpenAI · Provider OpenAI · Economy All Models Compare Models Pricing Developers · API Context Window · Glossary

o4 Mini

Frequently Asked Questions

Related Models

Benchmarks

Related Pages