How much does GPT-4o-mini cost?

GPT-4o-mini costs $0.15 per million input tokens and $0.60 per million output tokens. For a typical conversation (~2,000 tokens), that's approximately $0.001 per message.

What benchmarks has GPT-4o-mini been tested on?

GPT-4o-mini has been evaluated on 15 benchmarks. Top scores: GSM8K: 91.3, PIQA: 77.4, MMLU: 75.7.

Is GPT-4o-mini open source?

No, GPT-4o-mini is a proprietary model by OpenAI.

How does GPT-4o-mini compare to Mistral Nemo?

GPT-4o-mini has an average score of 37.5 while Mistral Nemo scores 37.4. GPT-4o-mini outperforms Mistral Nemo overall. GPT-4o-mini costs $0.15/1M input vs Mistral Nemo at $0.02/1M input. See full comparison →

Home/Models/GPT-4o-mini

GPT-4o-mini

Name: GPT-4o-mini
Price: 0.15 USD
Author: OpenAI

by OpenAI · Released Jul 2024

Multimodal

37.5

avg score

Rank #157

Compare

Better than 33% of all models

Context

128K tokens (~64 books)

Input $/1M

$0.15

Output $/1M

$0.60

Type

multimodal

License

Proprietary

Benchmarks

15 tested

Data updated today

About

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...

Tested on 15 benchmarks with 39.6% average. Top scores: GSM8K (91.3%), PIQA (77.4%), MMLU (75.7%).

Looking for similar performance at lower cost?
Mistral Nemo scores 37.4 (100% as good) at $0.02/1M input · 87% cheaper

Capabilities

coding

23.7

#123 globally

reasoning

0.1

#185 globally

math

50.3

#75 globally

knowledge

45.7

#124 globally

multimodal

53.1

#6 globally

Benchmark Scores

Compare All

Tested on 15 benchmarks · Ranked across 5 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

Aider — Code Editing

Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.

55.6—

WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

11.8—

Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

3.6—

reasoningCompare reasoning →

ARC-AGI-2

ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.

0.1—

mathCompare math →

GSM8K

Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.

91.3—

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

52.6—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

6.8—

Quick compare:

vs Mistral Nemo

vs GLM 4 32B

vs Llama 3.2 90B

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · OpenAI GPT-4o

GPT-4oMay 2024

$2.50/M in128Kctx

GPT-4o (2024-05-13)May 2024

51.1+51.1

$5.00/M in(+2.50)128Kctx8 benchmarks

GPT-4o (2024-08-06)Aug 2024

35.6-15.5

$2.50/M in(-2.50)128Kctx11 benchmarks

GPT-4o (2024-11-20)Nov 2024

37.7+2.1

$2.50/M in128Kctx28 benchmarks

GPT-4o (extended)May 2024

$6.00/M in(+3.50)128Kctx

GPT-4o AudioAug 2025

$2.50/M in(-3.50)128Kctx

GPT-4o Search PreviewMar 2025

$2.50/M in128Kctx

GPT-4o-miniJul 2024

39.6+39.6

$0.15/M in(-2.35)128Kctx15 benchmarks

GPT-4o-mini (2024-07-18)Jul 2024

43.2+3.6

$0.15/M in128Kctx20 benchmarks

GPT-4o-mini Search PreviewMar 2025

$0.15/M in128Kctx

See the full GPT-4o family →

Similar Models

Frequently Asked Questions

GPT-4o-mini is a proprietary multimodal AI model by OpenAI, released in July 2024. It has an average benchmark score of 37.5. Context window: 128K tokens.

Benchmarks

GSM8K PIQA MMLU Lech Mazur Writing GeoBench

OpenAI · Provider OpenAI · Economy All Models Compare Models Pricing Developers · API Context Window · Glossary

GPT-4o-mini

Frequently Asked Questions

Related Models

Benchmarks

Related Pages