How much does GPT-4o (2024-11-20) cost?

GPT-4o (2024-11-20) costs $2.50 per million input tokens and $10.00 per million output tokens. For a typical conversation (~2,000 tokens), that's approximately $0.015 per message.

What benchmarks has GPT-4o (2024-11-20) been tested on?

GPT-4o (2024-11-20) has been evaluated on 28 benchmarks. Top scores: ScienceQA: 84.7, HELM — WildBench: 82.8, Lech Mazur Writing: 81.8.

Is GPT-4o (2024-11-20) open source?

No, GPT-4o (2024-11-20) is a proprietary model by OpenAI.

How does GPT-4o (2024-11-20) compare to Qwen2.5 Coder 1.5B Instruct?

GPT-4o (2024-11-20) has an average score of 36.3 while Qwen2.5 Coder 1.5B Instruct scores 36.5. Qwen2.5 Coder 1.5B Instruct slightly outperforms GPT-4o (2024-11-20) overall. See full comparison →

Home/Models/GPT-4o (2024-11-20)

GPT-4o (2024-11-20)

Name: GPT-4o (2024-11-20)
Price: 2.5 USD
Author: OpenAI

by OpenAI · Released Nov 2024

Multimodal

36.3

avg score

Rank #160

Compare

Better than 31% of all models

Context

128K tokens (~64 books)

Input $/1M

$2.50

Output $/1M

$10.00

Type

multimodal

License

Proprietary

Benchmarks

28 tested

Data updated today

About

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded...

Tested on 28 benchmarks with 37.7% average. Top scores: ScienceQA (84.7%), HELM — WildBench (82.8%), Lech Mazur Writing (81.8%).

Looking for similar performance at lower cost?
Mistral Nemo scores 37.4 (103% as good) at $0.02/1M input · 99% cheaper

Capabilities

coding

26.4

#121 globally

reasoning

22.2

#99 globally

math

22.3

#151 globally

knowledge

57.2

#66 globally

agentic

8.6

#28 globally

multimodal

62.5

#4 globally

language

81.7

#46 globally

Benchmark Scores

Compare All

Tested on 28 benchmarks · Ranked across 7 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

Aider — Code Editing

Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.

71.4—

SWE-Bench verified

Real-world software engineering tasks from GitHub issues. Models must diagnose bugs and write patches that pass test suites. Human-verified subset of SWE-bench.

31.0—

CadEval

Computer-aided design evaluation. Tests understanding of CAD concepts, 3D modeling, and engineering design principles.

26.0—

reasoningCompare reasoning →

HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

82.8—

ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

4.5—

SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

1.4—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

53.3—

HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

29.3—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

6.3—

Quick compare:

vs Qwen2.5 Coder 1.5B Instruct

vs Qwen2.5-Max

vs Llama 3.2 3B Instruct

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · OpenAI GPT-4o

GPT-4oMay 2024

$2.50/M in128Kctx

GPT-4o (2024-05-13)May 2024

51.1+51.1

$5.00/M in(+2.50)128Kctx8 benchmarks

GPT-4o (2024-08-06)Aug 2024

35.6-15.5

$2.50/M in(-2.50)128Kctx11 benchmarks

GPT-4o (2024-11-20)Nov 2024

37.7+2.1

$2.50/M in128Kctx28 benchmarks

GPT-4o (extended)May 2024

$6.00/M in(+3.50)128Kctx

GPT-4o AudioAug 2025

$2.50/M in(-3.50)128Kctx

GPT-4o Search PreviewMar 2025

$2.50/M in128Kctx

GPT-4o-miniJul 2024

39.6+39.6

$0.15/M in(-2.35)128Kctx15 benchmarks

GPT-4o-mini (2024-07-18)Jul 2024

43.2+3.6

$0.15/M in128Kctx20 benchmarks

GPT-4o-mini Search PreviewMar 2025

$0.15/M in128Kctx

See the full GPT-4o family →

Similar Models

Qwen2.5 Coder 1.5B Instruct

Llama 3.2 3B Instruct

Frequently Asked Questions

GPT-4o (2024-11-20) is a proprietary multimodal AI model by OpenAI, released in November 2024. It has an average benchmark score of 36.3. Context window: 128K tokens.

Benchmarks

ScienceQA HELM — WildBench Lech Mazur Writing HELM — IFEval MMLU

OpenAI · Provider OpenAI · Economy All Models Compare Models Pricing Developers · API Context Window · Glossary

GPT-4o (2024-11-20)

Frequently Asked Questions

Related Models

Benchmarks

Related Pages