RESEARCHResearch hub · papers, methods, source maps across 128 benchmarks.

Charts·Build live AI market views Open charts Build your own chart

Research note · Live pricing dataset

API Pricing Compression Monitor

A live BenchGecko note tracking where AI API prices are compressing, which providers anchor the low cost floor, and which premium models still sit far above the market median.

Dataset date

May 6, 2026

BenchGecko generated data

386

Priced models

54 providers with input and output prices

$0.385

Median input

per one million input tokens

$1.2

Median output

per one million output tokens

155

Low cost band

models at or below $0.50 blended

Finding 01

The low cost cluster is already deep.

155 priced models sit at or below $0.50 blended per one million tokens using a 3 input to 1 output workload mix. That makes the bottom of the market wide enough for real substitution pressure.

Finding 02

Premium pricing still has a long tail.

35 priced models sit above $5 blended. The monitor treats these as premium outliers until benchmark strength, latency, context, or specialist capability explains the gap.

Finding 03

Output tokens remain the expensive side.

The median output price is $1.2 versus $0.385 for input. The 90th to 10th percentile spread is 77.8x for output and 46.2x for input.

Low cost floor

Cheapest blended token prices in the current dataset.

liquid · input $0.01 · output $0.02

liquid · input $0.01 · output $0.02

Mistral AI · input $0.02 · output $0.03

Llama 3.1 8B Instruct

Meta · input $0.02 · output $0.05

Llama 3 8B Instruct

Meta · input $0.03 · output $0.04

Granite 4.0 Micro

ibm-granite · input $0.017 · output $0.11

Llama 3 8B Lunaris

sao10k · input $0.04 · output $0.05

Google DeepMind · input $0.03 · output $0.09

Qwen2.5 Coder 7B Instruct

Alibaba Qwen · input $0.03 · output $0.09

Google DeepMind · input $0.04 · output $0.08

Provider compression table

Providers with at least three priced models, ranked by median blended price.

3 priced models

12 priced models

5 priced models

7 priced models

4 priced models

5 priced models

5 priced models

53 priced models

4 priced models

3 priced models

Premium watchlist

High priced models that need capability context before price alone is interpreted as market power.

OpenAI · input $150 · output $600

OpenAI · input $30 · output $180

OpenAI · input $30 · output $180

Claude Opus 4.6 (Fast)

Anthropic · input $30 · output $150

OpenAI · input $21 · output $168

OpenAI · input $15 · output $120

OpenAI · input $30 · output $60

GPT-4 (older v0314)

OpenAI · input $30 · output $60

Methodology and caveats

Prices are taken from BenchGecko model records and expressed per one million tokens where available.

The blended price uses a 3 input to 1 output workload mix. It is a comparison lens, not a universal workload model.

Compression does not mean quality convergence. Benchmark coverage, latency, context, tools, and reliability still matter.

Open pricing index Browse models Back to research