Does quality drop near the 1M limit?

Yes, usually. Most 1M-context models lose accuracy on needle-in-haystack tests past 500K tokens. Gemini 2.5 Pro holds up best on long-context benchmarks. For critical work past 500K, test before relying on it.

Is 1M context cheaper than chunked RAG?

Not always. Sending a 1M-token prompt can cost $1 to $15 per call. A well-tuned RAG with 8K context and good retrieval often wins on both cost and accuracy. Use 1M context when retrieval quality is the bottleneck.

How much does a single 1M-token call cost?

Roughly $-1000000.00 on the cheap end and $2.00 on the premium end per 1M input tokens. Output bills on top at the model's output rate.

Can I cache a 1M-token context?

Yes. Anthropic, Google, and OpenAI all support context caching. This is the single biggest cost lever for long-context work: cached reads typically cost 10 to 30 percent of a fresh read.

Context · 1M+

Cheapest 1M context LLMs

Every LLM with a 1,000,000+ token context window. Ranked by input price per 1M tokens.

Models40

Cheapest$-1000000.00

Min context1M tokens

All pricing Pricing home

What this page is

This page lists every priced model with a context window of at least one million tokens. 1M context unlocks whole-repo coding, book-length analysis, and massive multi-document RAG without chunking. The cost per call can be steep, so compare carefully and lean on context caching whenever possible.

Ranked by input price

1M+ context models, cheapest first.

#	Model	Provider	In $/1M	Out $/1M	Context	context	Type
1	Auto Router	openrouter	$-1000000.00	$-1000000.00	2.0M	2.0M	Closed
2	Lyria 3 Clip Preview	Google DeepMind	$0.00	$0.00	1.0M	1.0M	Closed
3	Lyria 3 Pro Preview	Google DeepMind	$0.00	$0.00	1.0M	1.0M	Closed
4	Qwen3.6 Plus (free)	Alibaba Qwen	$0.00	$0.00	1.0M	1.0M	Closed
5	Qwen3.6 Plus Preview (free)	Alibaba Qwen	$0.00	$0.00	1.0M	1.0M	OSS
6	Qwen3.5-Flash	Alibaba Qwen	$0.07	$0.26	1.0M	1.0M	OSS
7	Gemini 2.0 Flash Lite	Google DeepMind	$0.07	$0.30	1.0M	1.0M	Closed
8	Gemini 2.0 Flash	Google DeepMind	$0.10	$0.40	1.0M	1.0M	Closed
9	Gemini 2.5 Flash Lite	Google DeepMind	$0.10	$0.40	1.0M	1.0M	Closed
10	Gemini 2.5 Flash Lite Preview 09-2025	Google DeepMind	$0.10	$0.40	1.0M	1.0M	Closed
11	GPT-4.1 Nano	OpenAI	$0.10	$0.40	1.0M	1.0M	Closed
12	Llama 4 Maverick	Meta	$0.15	$0.60	1.0M	1.0M	OSS
13	Qwen3 Coder Flash	Alibaba Qwen	$0.20	$0.97	1.0M	1.0M	OSS
14	Grok 4 Fast	xAI	$0.20	$0.50	2.0M	2.0M	Closed
15	Grok 4.1 Fast	xAI	$0.20	$0.50	2.0M	2.0M	Closed
16	MiniMax-01	minimax	$0.20	$1.10	1.0M	1.0M	OSS
17	Gemini 3.1 Flash Lite Preview	Google DeepMind	$0.25	$1.50	1.0M	1.0M	Closed
18	Qwen Plus 0728	Alibaba Qwen	$0.26	$0.78	1.0M	1.0M	OSS
19	Qwen Plus 0728 (thinking)	Alibaba Qwen	$0.26	$0.78	1.0M	1.0M	OSS
20	Qwen-Plus	Alibaba Qwen	$0.26	$0.78	1.0M	1.0M	OSS
21	Qwen3.5 Plus 2026-02-15	Alibaba Qwen	$0.26	$1.56	1.0M	1.0M	OSS
22	Gemini 2.5 Flash	Google DeepMind	$0.30	$2.50	1.0M	1.0M	Closed
23	Nova 2 Lite	Amazon	$0.30	$2.50	1.0M	1.0M	Closed
24	Qwen3.6 Plus	Alibaba Qwen	$0.33	$1.95	1.0M	1.0M	OSS
25	GPT-4.1 Mini	OpenAI	$0.40	$1.60	1.0M	1.0M	Closed
26	MiniMax M1	minimax	$0.40	$2.20	1.0M	1.0M	Closed
27	Gemini 3 Flash Preview	Google DeepMind	$0.50	$3.00	1.0M	1.0M	Closed
28	Palmyra X5	writer	$0.60	$6.00	1.0M	1.0M	Closed
29	Qwen3 Coder Plus	Alibaba Qwen	$0.65	$3.25	1.0M	1.0M	OSS
30	MiMo-V2-Pro	xiaomi	$1.00	$3.00	1.0M	1.0M	Closed
31	Gemini 2.5 Pro	Google DeepMind	$1.25	$10.00	1.0M	1.0M	Closed
32	Gemini 2.5 Pro Preview 05-06	Google DeepMind	$1.25	$10.00	1.0M	1.0M	Closed
33	Gemini 2.5 Pro Preview 06-05	Google DeepMind	$1.25	$10.00	1.0M	1.0M	Closed
34	Gemini 3.1 Pro Preview	Google DeepMind	$2.00	$12.00	1.0M	1.0M	Closed
35	Gemini 3.1 Pro Preview Custom Tools	Google DeepMind	$2.00	$12.00	1.0M	1.0M	Closed
36	GPT-4.1	OpenAI	$2.00	$8.00	1.0M	1.0M	Closed
37	Grok 4.20	xAI	$2.00	$6.00	2.0M	2.0M	Closed
38	Grok 4.20 Beta	xAI	$2.00	$6.00	2.0M	2.0M	Closed
39	Grok 4.20 Multi-Agent	xAI	$2.00	$6.00	2.0M	2.0M	Closed
40	Grok 4.20 Multi-Agent Beta	xAI	$2.00	$6.00	2.0M	2.0M	Closed

Top 3 cheapest 1M context LLMs

Auto Router delivers a 2.0M context window at $-1000000.00/M input tokens. Suitable for whole-codebase analysis, long legal docs, and book-scale RAG.

Lyria 3 Clip Preview delivers a 1.0M context window at $0.00/M input tokens. Suitable for whole-codebase analysis, long legal docs, and book-scale RAG.

Lyria 3 Pro Preview delivers a 1.0M context window at $0.00/M input tokens. Suitable for whole-codebase analysis, long legal docs, and book-scale RAG.

The price gap · cheapest vs most expensive

Cheapest

Auto Router

$-1000000.00/M

$ per 1M input tokens

Why the gap

Premium 1M-context models pay for better accuracy at the tail of the window and faster ingestion. For research and one-shot analysis, the cheap end delivers equivalent answers on most prompts.

Most expensive

Gemini 3.1 Pro Preview

$2.00/M

$ per 1M input tokens

Frequently asked questions

Gemini 2.5 Pro and Flash were first to a real 1M window. Claude Sonnet extended to 1M. Qwen3 Long is a strong open-source option. MiniMax and several Chinese labs also ship 1M+. See the table above for current live list.

Cheapest 1M context LLMs

Ranked by input price

Top 3 cheapest 1M context LLMs

The price gap · cheapest vs most expensive

Frequently asked questions

See also

Other context tiers

Stacks

Compare