GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data: up to Sep 2021.
Tested on 7 benchmarks with 55.0% average. Top scores: Chatbot Arena Elo — Overall (1285.8%), GSM8K (92.0%), MMLU (81.9%).
Qwen3 30B A3B Thinking 2507 scores 63.5 (100% as good) at $0.08/1M input · 100% cheaper
Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.
Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Massive Multitask Language Understanding. 57 subjects from STEM, humanities, and social sciences. The most widely-cited knowledge benchmark.
Commonsense coreference resolution. Tests understanding of pronoun references in ambiguous sentences.
Graduate-level science questions written by PhD experts. Diamond subset contains questions where experts disagree, testing deep understanding.
- Typetext
- Context8K tokens (~4 books)
- ReleasedMay 2023
- LicenseProprietary
- StatusActive
- Cost / Message~$0.120