22.9
avg score
Rank #201
Better than 14% of all models
Context
128K tokens (~64 books)
Input $/1M
$0.35
Output $/1M
$0.56
Type
multimodal
License
Open Source
Benchmarks
5 tested
Data updated today
About
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and...
Tested on 5 benchmarks with 55.8% average. Top scores: HELM — WildBench (78.8%), HELM — IFEval (75.0%), HELM — MMLU-Pro (61.0%).
Looking for similar performance at lower cost?
Llama 4 Maverick scores 22.0 (96% as good) at $0.15/1M input · 57% cheaper
Llama 4 Maverick scores 22.0 (96% as good) at $0.15/1M input · 57% cheaper
Capabilities
reasoning
78.8
#12 globally
math
24.8
#146 globally
knowledge
50.1
#100 globally
language
75.0
#59 globally
Benchmark Scores
Compare AllTested on 5 benchmarks · Ranked across 4 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
reasoningCompare reasoning →
HELM — WildBench
78.8—Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.
mathCompare math →
HELM — Omni-MATH
24.8—Stanford HELM evaluation of mathematical reasoning across diverse problem types.
knowledgeCompare knowledge →
HELM — MMLU-Pro
61.0—Stanford HELM evaluation of MMLU-Pro. Tests broad knowledge with increased difficulty.
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Research
Documentation
Community
Source Code
BenchGecko API
mistral-small-3-1-24b-instruct
Specifications
- Typemultimodal
- Context128K tokens (~64 books)
- ReleasedMar 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.001
Available On
Share & Export
Frequently Asked Questions
Mistral Small 3.1 24B is an open-source multimodal AI model by Mistral AI, released in March 2025. It has an average benchmark score of 22.9. Context window: 128K tokens.