Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...
Tested on 9 benchmarks with 32.4% average. Top scores: Chatbot Arena Elo — Overall (1275.1%), MMLU (72.4%), Winogrande (67.0%).
Llama 3.3 70B Instruct (free) scores 29.6 (99% as good) at $0.00/1M input · 100% cheaper
Capture-the-flag cybersecurity challenges. Tests vulnerability analysis, reverse engineering, cryptography, and exploitation skills.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Massive Multitask Language Understanding. 57 subjects from STEM, humanities, and social sciences. The most widely-cited knowledge benchmark.
Commonsense coreference resolution. Tests understanding of pronoun references in ambiguous sentences.
Chinese MMLU. Comprehensive knowledge test specifically designed for Chinese language and culture.
- Typetext
- Context8K tokens (~4 books)
- ReleasedApr 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.002