Home/Models/Mistral Small 3.1 24B
Mistral AI logo

Mistral Small 3.1 24B

by Mistral AI · Released Mar 2025

Open SourceMultimodal
22.9
avg score
Rank #201
Compare
Better than 14% of all models
Context
128K tokens (~64 books)
Input $/1M
$0.35
Output $/1M
$0.56
Type
multimodal
License
Open Source
Benchmarks
5 tested
Data updated today
About

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and...

Tested on 5 benchmarks with 55.8% average. Top scores: HELM — WildBench (78.8%), HELM — IFEval (75.0%), HELM — MMLU-Pro (61.0%).

Looking for similar performance at lower cost?
Llama 4 Maverick scores 22.0 (96% as good) at $0.15/1M input · 57% cheaper
Capabilities
reasoning
78.8
#12 globally
math
24.8
#146 globally
knowledge
50.1
#100 globally
language
75.0
#59 globally
Benchmark Scores
Compare All
Tested on 5 benchmarks · Ranked across 4 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

78.8
HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

24.8
HELM — MMLU-Pro

Stanford HELM evaluation of MMLU-Pro. Tests broad knowledge with increased difficulty.

61.0
HELM — GPQA

Stanford HELM evaluation of GPQA. Tests graduate-level scientific reasoning.

39.2
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
mistral-small-3-1-24b-instruct
Specifications
  • Typemultimodal
  • Context128K tokens (~64 books)
  • ReleasedMar 2025
  • LicenseOpen Source
  • StatusActive
  • Cost / Message~$0.001
Available On
Mistral AI logoMistral AI$0.35
Share & Export
Tweet
Mistral Small 3.1 24B is an open-source multimodal AI model by Mistral AI, released in March 2025. It has an average benchmark score of 22.9. Context window: 128K tokens.