How much does gpt-oss-20b cost?

gpt-oss-20b costs $0.03 per million input tokens and $0.14 per million output tokens. For a typical conversation (~2,000 tokens), that's approximately $0.000 per message.

What benchmarks has gpt-oss-20b been tested on?

gpt-oss-20b has been evaluated on 6 benchmarks. Top scores: Chatbot Arena Elo — Overall: 1317.6, HELM — MMLU-Pro: 74.0, HELM — WildBench: 73.7.

Is gpt-oss-20b open source?

Yes, gpt-oss-20b is open source.

How does gpt-oss-20b compare to o1-preview?

gpt-oss-20b has an average score of 44.4 while o1-preview scores 44.5. o1-preview slightly outperforms gpt-oss-20b overall. See full comparison →

Home/Models/gpt-oss-20b

gpt-oss-20b

Name: gpt-oss-20b
Price: 0.029 USD
Author: OpenAI

by OpenAI · Released Aug 2025

Open Source

44.4

avg score

Rank #155

Compare

Better than 43% of all models

Context

131K tokens (~66 books)

Input $/1M

$0.03

Output $/1M

$0.14

Type

text

License

Open Source

Benchmarks

6 tested

Data updated today

About

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Tested on 6 benchmarks with 67.4% average. Top scores: Chatbot Arena Elo — Overall (1317.6%), HELM — MMLU-Pro (74.0%), HELM — WildBench (73.7%).

Capabilities

reasoning

73.7

#26 globally

math

56.5

#68 globally

knowledge

66.7

#31 globally

language

73.2

#71 globally

Benchmark Scores

Compare All

Tested on 6 benchmarks · Ranked across 5 categories

Score Distribution (all 274 models)

0255075100

▲ You are here

reasoningCompare reasoning →

HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

73.7—

mathCompare math →

HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

56.5—

knowledgeCompare knowledge →

HELM — MMLU-Pro

Stanford HELM evaluation of MMLU-Pro. Tests broad knowledge with increased difficulty.

74.0—

HELM — GPQA

Stanford HELM evaluation of GPQA. Tests graduate-level scientific reasoning.

59.4—

Quick compare:

vs o1-preview

vs Gemini 2.0 Flash

vs Gemini 2.0 Flash (Dec 2024)

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Similar Models

Gemini 2.0 Flash (Dec 2024)

Google DeepMind

44.5TBD

Links

Info

OpenAI Pricing explorer Developers · API

Research

Documentation

Community

Source Code

BenchGecko API

gpt-oss-20b

Specifications

Typetext
Context131K tokens (~66 books)
ReleasedAug 2025
LicenseOpen Source
StatusActive
Cost / Message~$0.000

Available On

OpenAI$0.03

Frequently Asked Questions

gpt-oss-20b is an open-source text AI model by OpenAI, released in August 2025. It has an average benchmark score of 44.4. Context window: 131K tokens.

Benchmarks

Chatbot Arena Elo — Overall HELM — MMLU-Pro HELM — WildBench HELM — IFEval HELM — GPQA

OpenAI · Provider OpenAI · Economy All Models Compare Models Pricing Developers · API Context Window · Glossary

gpt-oss-20b

Frequently Asked Questions

Related Models

Benchmarks

Related Pages