How much does GPT-4.1 Mini cost?

GPT-4.1 Mini costs $0.40 per million input tokens and $1.60 per million output tokens. For a typical conversation (~2,000 tokens), that's approximately $0.002 per message.

What benchmarks has GPT-4.1 Mini been tested on?

GPT-4.1 Mini has been evaluated on 16 benchmarks. Top scores: HELM — IFEval: 90.4, MATH level 5: 87.3, HELM — WildBench: 83.8.

Is GPT-4.1 Mini open source?

No, GPT-4.1 Mini is a proprietary model by OpenAI.

How does GPT-4.1 Mini compare to Gemini 1.5 Flash (May 2024)?

GPT-4.1 Mini has an average score of 43.1 while Gemini 1.5 Flash (May 2024) scores 43.1. Gemini 1.5 Flash (May 2024) slightly outperforms GPT-4.1 Mini overall. See full comparison →

Home/Models/GPT-4.1 Mini

GPT-4.1 Mini

Name: GPT-4.1 Mini
Price: 0.4 USD
Author: OpenAI

by OpenAI · Released Apr 2025

Multimodal1M Context

43.1

avg score

Rank #133

Compare

Better than 43% of all models

Context

1.0M tokens (~524 books)

Input $/1M

$0.40

Output $/1M

$1.60

Type

multimodal

License

Proprietary

Benchmarks

16 tested

Data updated today

About

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard...

Tested on 16 benchmarks with 44.5% average. Top scores: HELM — IFEval (90.4%), MATH level 5 (87.3%), HELM — WildBench (83.8%).

Looking for similar performance at lower cost?
gpt-oss-120b scores 43.7 (101% as good) at $0.04/1M input · 90% cheaper

Capabilities

coding

27.5

#119 globally

reasoning

29.1

#84 globally

math

46.4

#88 globally

knowledge

59.6

#50 globally

language

90.4

#9 globally

Benchmark Scores

Compare All

Tested on 16 benchmarks · Ranked across 5 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

37.6—

Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

32.4—

SWE-Bench Verified (Bash Only)

SWE-bench Verified solved using only bash commands, no specialized frameworks. Tests raw terminal-based problem solving.

23.9—

reasoningCompare reasoning →

HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

83.8—

ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

3.5—

ARC-AGI-2

ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.

0.1—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

87.3—

HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

49.1—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

44.7—

Quick compare:

vs Gemini 1.5 Flash (May 2024)

vs Gemma 2 2b It

vs Qwen2.5 3B Instruct

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · OpenAI GPT-4.1

GPT-4.1Apr 2025

43.3

$2.00/M in1.0Mctx22 benchmarks

GPT-4.1 MiniApr 2025

44.5+1.2

$0.40/M in(-1.60)1.0Mctx16 benchmarks

GPT-4.1 NanoApr 2025

35.2-9.3

$0.10/M in(-0.30)1.0Mctx14 benchmarks

See the full GPT-4.1 family →

Similar Models

Gemini 1.5 Flash (May 2024)

Links

Info

OpenAI Pricing explorer Developers · API

Research

Technical Report

Documentation

API Docs Playground

Community

@OpenAI

BenchGecko API

gpt-4-1-mini

Specifications

Typemultimodal
Context1.0M tokens (~524 books)
ReleasedApr 2025
LicenseProprietary
StatusActive
Cost / Message~$0.002

Available On

OpenAI$0.40

Frequently Asked Questions

GPT-4.1 Mini is a proprietary multimodal AI model by OpenAI, released in April 2025. It has an average benchmark score of 43.1. Context window: 1M tokens.

Benchmarks

HELM — IFEval MATH level 5 HELM — WildBench HELM — MMLU-Pro HELM — GPQA

OpenAI · Provider OpenAI · Economy All Models Compare Models Pricing Developers · API Context Window · Glossary

GPT-4.1 Mini

Frequently Asked Questions

Related Models

Benchmarks

Related Pages