GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and...
Tested on 8 benchmarks with 43.3% average. Top scores: ARC-AGI (70.2%), WeirdML (60.4%), SimpleBench (53.9%).
Phi 4 scores 54.2 (102% as good) at $0.07/1M input · 100% cheaper
Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.
Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.
Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.
ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.
Hardest tier of FrontierMath. Problems at the frontier of human mathematical ability, many unsolved by most mathematicians.
- Typemultimodal
- Context400K tokens (~200 books)
- ReleasedOct 2025
- LicenseProprietary
- StatusActive
- Cost / Message~$0.150