Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...
Tested on 11 benchmarks with 18.9% average. Top scores: MATH level 5 (62.3%), Fiction.LiveBench (36.0%), GPQA diamond (35.8%).
Llama 3.2 3B Instruct (free) scores 14.7 (96% as good) at $0.00/1M input · 100% cheaper
SWE-bench Verified solved using only bash commands, no specialized frameworks. Tests raw terminal-based problem solving.
Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.
ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Original research-level math problems created by professional mathematicians. Problems are unpublished and cannot be memorized.
- Typemultimodal
- Context328K tokens (~164 books)
- ReleasedApr 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.000