OpenAI's smartest model · GPT-5.5. Same speed as GPT-5.4. Plans, uses tools, checks its own work. Tops Terminal-Bench 2.0 (82.7%), GDPval (84.9%), ARC-AGI-2 (85.0%), CyberGym (81.8%). SOTA on AA Coding Index at half the cost.
Tested on 6 benchmarks with 85.0% average. Top scores: ARC-AGI (95.0%), GPQA diamond (93.6%), browsecomp (84.4%).
Qwen2.5 72B Instruct scores 65.8 (100% as good) at $0.12/1M input · 98% cheaper
Complex terminal-based engineering tasks. Models must use command-line tools, navigate filesystems, and debug systems through shell interaction.
Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.
Graduate-level science questions written by PhD experts. Diamond subset contains questions where experts disagree, testing deep understanding.
- Typetext
- Context400K tokens (~200 books)
- ReleasedApr 2026
- LicenseProprietary
- StatusActive
- Cost / Message~$0.040