The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.
Tested on 12 benchmarks with 51.0% average. Top scores: HellaSwag (93.7%), GSM8K (90.0%), TriviaQA (84.8%).
Qwen2.5 Coder 7B Instruct scores 56.0 (100% as good) at $0.03/1M input · 100% cheaper
Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.
BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.
Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.
Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
- Typemultimodal
- Context128K tokens (~64 books)
- ReleasedApr 2024
- LicenseProprietary
- StatusActive
- Cost / Message~$0.050