Home/Models/Llama 4 Maverick
Meta logo

Llama 4 Maverick

by Meta · Released Apr 2025

Open SourceMultimodal1M Context
22.0
avg score
Rank #203
Compare
Better than 13% of all models
Context
1.0M tokens (~524 books)
Input $/1M
$0.15
Output $/1M
$0.60
Type
multimodal
License
Open Source
Benchmarks
17 tested
Data updated today
About

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...

Tested on 17 benchmarks with 28.0% average. Top scores: MATH level 5 (73.0%), Lech Mazur Writing (63.7%), GPQA diamond (56.0%).

Looking for similar performance at lower cost?
Llama 3.2 1B Instruct scores 19.9 (90% as good) at $0.03/1M input · 82% cheaper
Capabilities
coding
20.4
#126 globally
reasoning
5.9
#160 globally
math
31.4
#127 globally
knowledge
43.8
#130 globally
speed
22.9
#52 globally
Benchmark Scores
Compare All
Tested on 17 benchmarks · Ranked across 5 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

24.5
SWE-Bench Verified (Bash Only)

SWE-bench Verified solved using only bash commands, no specialized frameworks. Tests raw terminal-based problem solving.

21.0
Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

15.6
SimpleBench

Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.

13.2
ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

4.4
ARC-AGI-2

ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.

0.1
MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

73.0
OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

20.5
FrontierMath-2025-02-28-Private

Original research-level math problems created by professional mathematicians. Problems are unpublished and cannot be memorized.

0.7
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
llama-4-maverick
Specifications
  • Typemultimodal
  • Context1.0M tokens (~524 books)
  • ReleasedApr 2025
  • LicenseOpen Source
  • StatusActive
  • Cost / Message~$0.001
Available On
Meta logoMeta$0.15
Share & Export
Tweet
Llama 4 Maverick is an open-source multimodal AI model by Meta, released in April 2025. It has an average benchmark score of 22.0. Context window: 1M tokens.