How much does Mistral Small 3.1 24B cost?

Mistral Small 3.1 24B costs $0.35 per million input tokens and $0.56 per million output tokens. For a typical conversation (~2,000 tokens), that's approximately $0.001 per message.

What benchmarks has Mistral Small 3.1 24B been tested on?

Mistral Small 3.1 24B has been evaluated on 5 benchmarks. Top scores: HELM — WildBench: 78.8, HELM — IFEval: 75.0, HELM — MMLU-Pro: 61.0.

Is Mistral Small 3.1 24B open source?

Yes, Mistral Small 3.1 24B is open source.

How does Mistral Small 3.1 24B compare to Gemini 1.0 Pro?

Mistral Small 3.1 24B has an average score of 22.9 while Gemini 1.0 Pro scores 23.1. Gemini 1.0 Pro slightly outperforms Mistral Small 3.1 24B overall. See full comparison →

Home/Models/Mistral Small 3.1 24B

Mistral Small 3.1 24B

Name: Mistral Small 3.1 24B
Price: 0.35 USD
Author: Mistral AI

by Mistral AI · Released Mar 2025

Open SourceMultimodal

22.9

avg score

Rank #201

Compare

Better than 14% of all models

Context

128K tokens (~64 books)

Input $/1M

$0.35

Output $/1M

$0.56

Type

multimodal

License

Open Source

Benchmarks

5 tested

Data updated today

About

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and...

Tested on 5 benchmarks with 55.8% average. Top scores: HELM — WildBench (78.8%), HELM — IFEval (75.0%), HELM — MMLU-Pro (61.0%).

Looking for similar performance at lower cost?
Llama 4 Maverick scores 22.0 (96% as good) at $0.15/1M input · 57% cheaper

Capabilities

reasoning

78.8

#12 globally

math

24.8

#146 globally

knowledge

50.1

#100 globally

language

75.0

#59 globally

Benchmark Scores

Compare All

Tested on 5 benchmarks · Ranked across 4 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

reasoningCompare reasoning →

HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

78.8—

mathCompare math →

HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

24.8—

knowledgeCompare knowledge →

HELM — MMLU-Pro

Stanford HELM evaluation of MMLU-Pro. Tests broad knowledge with increased difficulty.

61.0—

HELM — GPQA

Stanford HELM evaluation of GPQA. Tests graduate-level scientific reasoning.

39.2—

Quick compare:

vs Gemini 1.0 Pro

vs Llama 2 7b Hf

vs Llama 4 Maverick

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · Mistral AI Mistral Small

Mistral Small 3Jan 2025

$0.05/M in33Kctx1 benchmark

Mistral Small 3.1 24BMar 2025

55.8+55.8

$0.35/M in(+0.30)128Kctx(+95K)5 benchmarks

Mistral Small 3.1 24B (free)Mar 2025

$0.00/M in(-0.35)128Kctx

Mistral Small 3.2 24BJun 2025

$0.07/M in(+0.07)128Kctx

Mistral Small 4Mar 2026

$0.15/M in(+0.07)262Kctx(+134K)3 benchmarks

Mistral Small CreativeDec 2025

$0.10/M in(-0.05)33Kctx(-229K)

See the full Mistral Small family →

Similar Models

Frequently Asked Questions

Mistral Small 3.1 24B is an open-source multimodal AI model by Mistral AI, released in March 2025. It has an average benchmark score of 22.9. Context window: 128K tokens.

Benchmarks

HELM — WildBench HELM — IFEval HELM — MMLU-Pro HELM — GPQA HELM — Omni-MATH

Mistral AI · Provider Mistral AI · Economy All Models Compare Models Pricing Developers · API Context Window · Glossary

Mistral Small 3.1 24B

Frequently Asked Questions

Related Models

Benchmarks

Related Pages