How much does Claude 3 Opus cost?

Claude 3 Opus pricing information is not yet available.

What benchmarks has Claude 3 Opus been tested on?

Claude 3 Opus has been evaluated on 8 benchmarks. Top scores: MMLU: 79.5, Winogrande: 77.0, MATH level 5: 37.5.

Is Claude 3 Opus open source?

No, Claude 3 Opus is a proprietary model by Anthropic.

How does Claude 3 Opus compare to DeepSeek R1 Distill Llama 8B?

Claude 3 Opus has an average score of 38.4 while DeepSeek R1 Distill Llama 8B scores 38.4. DeepSeek R1 Distill Llama 8B slightly outperforms Claude 3 Opus overall. See full comparison →

Home/Models/Claude 3 Opus

Claude 3 Opus

Name: Claude 3 Opus
Author: Anthropic

by Anthropic · Released Jan 2024

38.4

avg score

Rank #150

Compare

Better than 36% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Proprietary

Benchmarks

8 tested

Data updated today

About

Tested on 8 benchmarks with 33.7% average. Top scores: MMLU (79.5%), Winogrande (77.0%), MATH level 5 (37.5%).

Capabilities

coding

16.6

#130 globally

reasoning

8.2

#148 globally

math

21.1

#156 globally

knowledge

62.0

#36 globally

Benchmark Scores

Compare All

Tested on 8 benchmarks · Ranked across 4 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

23.2—

Cybench

Capture-the-flag cybersecurity challenges. Tests vulnerability analysis, reverse engineering, cryptography, and exploitation skills.

10.0—

reasoningCompare reasoning →

SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

8.2—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

37.5—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

4.6—

Quick compare:

vs DeepSeek R1 Distill Llama 8B

vs Command R+ (08-2024)

vs DeepSeek R1 Distill Qwen 7B

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · Anthropic Claude Opus

Claude 3 OpusJan 2024

33.7

N/AN/Actx8 benchmarks

Claude Opus 4May 2025

41.7+8.0

$15.00/M in200Kctx19 benchmarks

Claude Opus 4.1Aug 2025

41.3-0.4

$15.00/M in200Kctx14 benchmarks

Claude Opus 4.5Nov 2025

45.4+4.1

$5.00/M in(-10)200Kctx28 benchmarks

Claude Opus 4.6Feb 2026

57.5+12.1

$5.00/M in1.0Mctx(+800K)19 benchmarks

Claude Opus 4.6 (Fast)Apr 2026

43.4-14.1

$30.00/M in(+25)1.0Mctx12 benchmarks

Claude Opus 4.7Apr 2026

$5.00/M in(-25)1.0Mctx

See the full Claude Opus family →

Similar Models

DeepSeek R1 Distill Llama 8B

DeepSeek R1 Distill Qwen 7B

DeepSeek

38.3TBD

Links

Info

Anthropic Pricing explorer Developers · API

Research

Technical Report

Documentation

API Docs Playground

Community

@Anthropic

BenchGecko API

claude-3-opus

Specifications

Typetext
ContextN/A
ReleasedJan 2024
LicenseProprietary
Statusbenchmark-only

Available On

AnthropicTBD

Frequently Asked Questions

Claude 3 Opus is a proprietary text AI model by Anthropic, released in January 2024. It has an average benchmark score of 38.4.

Benchmarks

MMLU Winogrande MATH level 5 GPQA diamond WeirdML

Anthropic · Provider Anthropic · Economy All Models Compare Models Pricing Developers · API

Claude 3 Opus

Frequently Asked Questions

Related Models

Benchmarks

Related Pages