Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance...
Tested on 10 benchmarks with 37.1% average. Top scores: MATH level 5 (96.4%), OTIS Mock AIME 2024-2025 (66.6%), GPQA diamond (61.6%).
ERNIE 4.5 21B A3B Thinking scores 39.8 (102% as good) at $0.07/1M input · 93% cheaper
Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.
Complex terminal-based engineering tasks. Models must use command-line tools, navigate filesystems, and debug systems through shell interaction.
Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.
ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Original research-level math problems created by professional mathematicians. Problems are unpublished and cannot be memorized.
- Typemultimodal
- Context200K tokens (~100 books)
- ReleasedOct 2025
- LicenseProprietary
- StatusActive
- Cost / Message~$0.007