How much does o1-preview cost?

o1-preview pricing information is not yet available.

What benchmarks has o1-preview been tested on?

o1-preview has been evaluated on 9 benchmarks. Top scores: Chatbot Arena Elo — Overall: 1387.7, MATH level 5: 81.7, Aider — Code Editing: 79.7.

Is o1-preview open source?

No, o1-preview is a proprietary model by OpenAI.

How does o1-preview compare to Qwen3 235B A22B Instruct 2507?

o1-preview has an average score of 45.6 while Qwen3 235B A22B Instruct 2507 scores 45.7. Qwen3 235B A22B Instruct 2507 slightly outperforms o1-preview overall. See full comparison →

Home/Models/o1-preview

o1-preview

Name: o1-preview
Author: OpenAI

by OpenAI · Released Jan 2024

45.6

avg score

Rank #122

Compare

Better than 48% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Proprietary

Benchmarks

9 tested

Data updated today

About

Tested on 9 benchmarks with 41.5% average. Top scores: Chatbot Arena Elo — Overall (1387.7%), MATH level 5 (81.7%), Aider — Code Editing (79.7%).

Capabilities

coding

45.8

#81 globally

reasoning

24.0

#97 globally

math

56.3

#59 globally

knowledge

33.8

#169 globally

Benchmark Scores

Compare All

Tested on 9 benchmarks · Ranked across 5 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

Aider — Code Editing

Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.

79.7—

WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

47.6—

Cybench

Capture-the-flag cybersecurity challenges. Tests vulnerability analysis, reverse engineering, cryptography, and exploitation skills.

10.0—

reasoningCompare reasoning →

SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

30.0—

ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

18.0—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

81.7—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

31.0—

Quick compare:

vs Qwen3 235B A22B Instruct 2507

vs Gemini 2.0 Flash Lite

vs Claude Opus 4

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · OpenAI o1

o1Dec 2024

56.4

$15.00/M in200Kctx14 benchmarks

o1-miniJan 2024

34.9-21.5

N/AN/Actx13 benchmarks

o1-previewJan 2024

41.5+6.6

N/AN/Actx9 benchmarks

o1-proMar 2025

$150.00/M in200Kctx

See the full o1 family →

Similar Models

Qwen3 235B A22B Instruct 2507

Alibaba Qwen

45.7$0.07/1M

Gemini 2.0 Flash Lite

Links

Info

OpenAI Pricing explorer Developers · API

Research

Technical Report

Documentation

API Docs Playground

Community

@OpenAI

BenchGecko API

o1-preview

Specifications

Typetext
ContextN/A
ReleasedJan 2024
LicenseProprietary
Statusbenchmark-only

Available On

OpenAITBD

Frequently Asked Questions

o1-preview is a proprietary text AI model by OpenAI, released in January 2024. It has an average benchmark score of 45.6.

Benchmarks

Chatbot Arena Elo — Overall MATH level 5 Aider — Code Editing WeirdML GPQA diamond

OpenAI · Provider OpenAI · Economy All Models Compare Models Pricing Developers · API

o1-preview

Frequently Asked Questions

Related Models

Benchmarks

Related Pages