Better than 36% of all models
Context
N/A
Input $/1M
TBD
Output $/1M
TBD
Type
text
License
Proprietary
Benchmarks
8 tested
Data updated today
About
Tested on 8 benchmarks with 33.7% average. Top scores: MMLU (79.5%), Winogrande (77.0%), MATH level 5 (37.5%).
Capabilities
coding
16.6
#130 globally
reasoning
8.2
#148 globally
math
21.1
#156 globally
knowledge
62.0
#36 globally
Benchmark Scores
Compare AllTested on 8 benchmarks · Ranked across 4 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
codingCompare coding →
WeirdML
23.2—Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.
Cybench
10.0—Capture-the-flag cybersecurity challenges. Tests vulnerability analysis, reverse engineering, cryptography, and exploitation skills.
reasoningCompare reasoning →
SimpleBench
8.2—Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.
mathCompare math →
MATH level 5
37.5—Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
OTIS Mock AIME 2024-2025
4.6—Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Research
Documentation
Community
BenchGecko API
claude-3-opus
Specifications
- Typetext
- ContextN/A
- ReleasedJan 2024
- LicenseProprietary
- Statusbenchmark-only
Available On
Learn More
Share & Export
Frequently Asked Questions
Claude 3 Opus is a proprietary text AI model by Anthropic, released in January 2024. It has an average benchmark score of 38.4.