LIVETracking 994 AI models from 267 providers.

Gecko Tests·powered by GeckoBench · AI Bias, Censorship, IQ & Politics View Gecko Tests Build your own chart

Home/Models/o3 Mini

o3 Mini

by OpenAI · Released Jan 2025

37.8

avg score

Rank #156

Better than 33% of all models

Context

200K tokens (~100 books)

Input $/1M

$1.10

Output $/1M

$4.40

Type

text

License

Proprietary

Benchmarks

17 tested

Data updated today

About

OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. This model supports the `reasoning_effort` parameter, which can be set to...

Tested on 17 benchmarks with 38.4% average. Top scores: Chatbot Arena Elo — Overall (1347.5%), MATH level 5 (96.5%), OTIS Mock AIME 2024-2025 (76.9%).

Looking for similar performance at lower cost?
Mistral Nemo scores 37.4 (99% as good) at $0.02/1M input · 98% cheaper

Capabilities

coding

36.4

#105 globally

reasoning

15.0

#112 globally

math

47.5

#82 globally

knowledge

49.5

#104 globally

Benchmark Scores

Tested on 17 benchmarks · Ranked across 5 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

60.4—

Computer-aided design evaluation. Tests understanding of CAD concepts, 3D modeling, and engineering design principles.

54.0—

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

43.7—

reasoningCompare reasoning →

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

34.5—

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

7.4—

ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.

3.0—

mathCompare math →

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

96.5—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

76.9—

FrontierMath-2025-02-28-Private

Original research-level math problems created by professional mathematicians. Problems are unpublished and cannot be memorized.

12.4—

Quick compare:

vs Llama 3.2 90B

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · OpenAI o3

$2.00/M in200Kctx33 benchmarks

o3 Deep ResearchOct 2025

$10.00/M in(+8)200Kctx

o3 MiniJan 2025

$1.10/M in(-8.90)200Kctx17 benchmarks

o3 Mini HighFeb 2025

$1.10/M in200Kctx2 benchmarks

$20.00/M in(+18.90)200Kctx8 benchmarks

See the full o3 family →

Similar Models

Links

Info

OpenAI Pricing explorer Developers · API

Research

Technical Report

Documentation

API Docs Playground

Community

BenchGecko API

o3-mini

Specifications

Typetext
Context200K tokens (~100 books)
ReleasedJan 2025
LicenseProprietary
StatusActive
Cost / Message~$0.007

Available On

OpenAI$1.10

Categories

coding reasoning math knowledge

Learn More

context-window transformer tokens

Share & Export

Related Models

Gemini 2.5 Flash

Frequently Asked Questions

o3 Mini is a proprietary text AI model by OpenAI, released in January 2025. It has an average benchmark score of 37.8. Context window: 200K tokens.

Related Models

GLM 4 32B · z-ai Llama 3.2 90B · Meta GPT-4o-mini · OpenAI Gemini 2.5 Flash · Google DeepMind Mistral Nemo · Mistral AI

Benchmarks

Chatbot Arena Elo — Overall MATH level 5 OTIS Mock AIME 2024-2025 GPQA diamond Lech Mazur Writing

Related Pages

OpenAI · Provider OpenAI · Economy All Models Compare Models Pricing Developers · API Context Window · Glossary