Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to...
Tested on 16 benchmarks with 27.4% average. Top scores: Chatbot Arena Elo — Overall (1211.0%), GSM8K (82.4%), PIQA (62.4%).
Gemma 3 27B (free) scores 35.0 (102% as good) at $0.00/1M input · 100% cheaper
Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.
Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.
HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.
Grade school math word problems. 8,500 problems testing multi-step arithmetic reasoning. A foundational math benchmark.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.
- Typetext
- Context16K tokens (~8 books)
- ReleasedJul 2024
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.000