How much does o1-mini cost?

o1-mini pricing information is not yet available.

What benchmarks has o1-mini been tested on?

o1-mini has been evaluated on 13 benchmarks. Top scores: Chatbot Arena Elo — Overall: 1336.6, MATH level 5: 89.2, Aider — Code Editing: 70.7.

Is o1-mini open source?

No, o1-mini is a proprietary model by OpenAI.

How does o1-mini compare to Claude 3 Sonnet?

o1-mini has an average score of 32.4 while Claude 3 Sonnet scores 32.0. o1-mini outperforms Claude 3 Sonnet overall. See full comparison →

Home/Models/o1-mini

o1-mini

Name: o1-mini
Author: OpenAI

by OpenAI · Released Jan 2024

32.4

avg score

Rank #171

Compare

Better than 27% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Proprietary

Benchmarks

13 tested

Data updated today

About

Tested on 13 benchmarks with 34.9% average. Top scores: Chatbot Arena Elo — Overall (1336.6%), MATH level 5 (89.2%), Aider — Code Editing (70.7%).

Capabilities

coding

37.5

#103 globally

reasoning

5.5

#162 globally

math

45.9

#91 globally

knowledge

57.4

#63 globally

Benchmark Scores

Compare All

Tested on 13 benchmarks · Ranked across 5 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

Aider — Code Editing

Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.

70.7—

WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

36.3—

Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

32.9—

reasoningCompare reasoning →

ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

14.0—

SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

1.7—

ARC-AGI-2

ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.

0.8—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

89.2—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

46.9—

FrontierMath-2025-02-28-Private

Original research-level math problems created by professional mathematicians. Problems are unpublished and cannot be memorized.

1.7—

Quick compare:

vs Claude 3 Sonnet

vs Gemini 2.0 Flash Thinking (Jan 2025)

vs Meta Llama 3 8B Instruct

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · OpenAI o1

o1Dec 2024

56.4

$15.00/M in200Kctx14 benchmarks

o1-miniJan 2024

34.9-21.5

N/AN/Actx13 benchmarks

o1-previewJan 2024

41.5+6.6

N/AN/Actx9 benchmarks

o1-proMar 2025

$150.00/M in200Kctx

See the full o1 family →

Similar Models

Claude 3 Sonnet

Anthropic

32.0TBD

Gemini 2.0 Flash Thinking (Jan 2025)

Google DeepMind

32.9TBD

Meta Llama 3 8B Instruct

Frequently Asked Questions

o1-mini is a proprietary text AI model by OpenAI, released in January 2024. It has an average benchmark score of 32.4.

Benchmarks

Chatbot Arena Elo — Overall MATH level 5 Aider — Code Editing Lech Mazur Writing GPQA diamond

OpenAI · Provider OpenAI · Economy All Models Compare Models Pricing Developers · API

o1-mini

Frequently Asked Questions

Related Models

Benchmarks

Related Pages