The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai.com/index/introducing-structured-outputs-in-the-api/). GPT-4o ("o" for "omni") is...
Tested on 11 benchmarks with 35.6% average. Top scores: Chatbot Arena Elo — Overall (1334.3%), MMLU (79.1%), Aider — Code Editing (71.4%).
ERNIE 4.5 21B A3B Thinking scores 39.8 (101% as good) at $0.07/1M input · 97% cheaper
Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.
Computer-aided design evaluation. Tests understanding of CAD concepts, 3D modeling, and engineering design principles.
Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.
Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Original research-level math problems created by professional mathematicians. Problems are unpublished and cannot be memorized.
- Typemultimodal
- Context128K tokens (~64 books)
- ReleasedAug 2024
- LicenseProprietary
- StatusActive
- Cost / Message~$0.015