Home/Models/Claude Opus 4
Anthropic logo

Claude Opus 4

by Anthropic · Released May 2025

Multimodal
46.0
avg score
Rank #120
Compare
Better than 48% of all models
Context
200K tokens (~100 books)
Input $/1M
$15.00
Output $/1M
$75.00
Type
multimodal
License
Proprietary
Benchmarks
19 tested
Data updated today
About

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in...

Tested on 19 benchmarks with 41.7% average. Top scores: MATH level 5 (85.0%), Aider polyglot (72.0%), SWE-Bench verified (70.7%).

Looking for similar performance at lower cost?
Qwen3 235B A22B Instruct 2507 scores 45.7 (99% as good) at $0.07/1M input · 100% cheaper
Capabilities
coding
49.8
#67 globally
reasoning
31.6
#81 globally
math
39.5
#106 globally
knowledge
40.1
#151 globally
Benchmark Scores
Compare All
Tested on 19 benchmarks · Ranked across 4 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

72.0
SWE-Bench verified

Real-world software engineering tasks from GitHub issues. Models must diagnose bugs and write patches that pass test suites. Human-verified subset of SWE-bench.

70.7
SWE-Bench Verified (Bash Only)

SWE-bench Verified solved using only bash commands, no specialized frameworks. Tests raw terminal-based problem solving.

67.6
SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

50.6
ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

35.7
ARC-AGI-2

ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.

8.6
MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

85.0
OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

64.4
FrontierMath-2025-02-28-Private

Original research-level math problems created by professional mathematicians. Problems are unpublished and cannot be memorized.

4.5
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
claude-opus-4
Specifications
  • Typemultimodal
  • Context200K tokens (~100 books)
  • ReleasedMay 2025
  • LicenseProprietary
  • StatusActive
  • Cost / Message~$0.105
Available On
Anthropic logoAnthropic$15.00
Share & Export
Tweet
Claude Opus 4 is a proprietary multimodal AI model by Anthropic, released in May 2025. It has an average benchmark score of 46.0. Context window: 200K tokens.