Better than 41% of all models
Context
N/A
Input $/1M
TBD
Output $/1M
TBD
Type
text
License
Proprietary
Benchmarks
5 tested
Data updated today
About
Tested on 5 benchmarks with 37.2% average. Top scores: TriviaQA (87.5%), MMLU (71.3%), GPQA diamond (12.9%).
Capabilities
math
7.1
#194 globally
knowledge
57.2
#64 globally
Benchmark Scores
Compare AllTested on 5 benchmarks · Ranked across 2 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
mathCompare math →
MATH level 5
11.7—Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
OTIS Mock AIME 2024-2025
2.4—Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
knowledgeCompare knowledge →
TriviaQA
87.5—Trivia questions sourced from trivia enthusiasts and quiz websites. Tests breadth of general knowledge.
MMLU
71.3—Massive Multitask Language Understanding. 57 subjects from STEM, humanities, and social sciences. The most widely-cited knowledge benchmark.
GPQA diamond
12.9—Graduate-level science questions written by PhD experts. Diamond subset contains questions where experts disagree, testing deep understanding.
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Research
Documentation
Community
BenchGecko API
claude-2
Specifications
- Typetext
- ContextN/A
- ReleasedJan 2024
- LicenseProprietary
- Statusbenchmark-only
Available On
Learn More
Share & Export
Frequently Asked Questions
Claude 2 is a proprietary text AI model by Anthropic, released in January 2024. It has an average benchmark score of 41.8.