Home/Models/GPT-4.1 Mini
OpenAI logo

GPT-4.1 Mini

by OpenAI · Released Apr 2025

Multimodal1M Context
43.1
avg score
Rank #133
Compare
Better than 43% of all models
Context
1.0M tokens (~524 books)
Input $/1M
$0.40
Output $/1M
$1.60
Type
multimodal
License
Proprietary
Benchmarks
16 tested
Data updated today
About

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard...

Tested on 16 benchmarks with 44.5% average. Top scores: HELM — IFEval (90.4%), MATH level 5 (87.3%), HELM — WildBench (83.8%).

Looking for similar performance at lower cost?
gpt-oss-120b scores 43.7 (101% as good) at $0.04/1M input · 90% cheaper
Capabilities
coding
27.5
#119 globally
reasoning
29.1
#84 globally
math
46.4
#88 globally
knowledge
59.6
#50 globally
language
90.4
#9 globally
Benchmark Scores
Compare All
Tested on 16 benchmarks · Ranked across 5 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

37.6
Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

32.4
SWE-Bench Verified (Bash Only)

SWE-bench Verified solved using only bash commands, no specialized frameworks. Tests raw terminal-based problem solving.

23.9
HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

83.8
ARC-AGI

Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.

3.5
ARC-AGI-2

ARC-AGI 2, harder sequel to ARC. More complex abstract reasoning patterns that test generalization ability beyond training data.

0.1
MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

87.3
HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

49.1
OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

44.7
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
gpt-4-1-mini
Specifications
  • Typemultimodal
  • Context1.0M tokens (~524 books)
  • ReleasedApr 2025
  • LicenseProprietary
  • StatusActive
  • Cost / Message~$0.002
Available On
OpenAI logoOpenAI$0.40
Share & Export
Tweet
GPT-4.1 Mini is a proprietary multimodal AI model by OpenAI, released in April 2025. It has an average benchmark score of 43.1. Context window: 1M tokens.