Which models cap at 128K?

Smaller frontier models (GPT-4o-mini at some tiers, Claude Haiku on legacy), mid-tier open source (Llama 3.3, some Mistral variants), older Qwen and DeepSeek lines. Many newer models ship 128K as baseline with 200K+ extended.

Is the cheapest 128K model good enough for RAG?

Usually yes. RAG typically sends 4K to 32K retrieved context. A 128K window is 4x to 30x what you actually need, so the lowest-priced 128K model is often the optimal RAG backbone.

Does 128K mean accurate up to 128K?

No. Many 128K models lose fidelity past 64K. Gemini and Claude hold up best. If retrieval quality matters, use RAG with short context and pick a reasoning-strong model.

Why pay more at this tier?

Premium 128K models (Opus, GPT-5) win on quality per token, tool use, and output speed. If you care about latency or reasoning quality, pay up. If you just need a big window, pick the cheap one.

Context · 128K+

Cheapest 128K context LLMs

Every LLM with a 128,000+ token context window. Ranked by input price per 1M tokens.

Models40

Cheapest$-1000000.00

Min context128K tokens

All pricing Pricing home

What this page is

128K is the modern baseline for LLM context. Every model priced per 1M tokens at 128K+ is listed here, cheapest first. For RAG pipelines, long conversations, and mid-sized documents, this is the sweet spot.

Ranked by input price

128K+ context models, cheapest first.

#	Model	Provider	In $/1M	Out $/1M	Context	context	Type
1	Auto Router	openrouter	$-1000000.00	$-1000000.00	2.0M	2.0M	Closed
2	Body Builder (beta)	openrouter	$-1000000.00	$-1000000.00	128K	128K	Closed
3	Pareto Code Router	openrouter	$-1000000.00	$-1000000.00	200K	200K	Closed
4	Elephant	openrouter	$0.00	$0.00	262K	262K	Closed
5	Free Models Router	openrouter	$0.00	$0.00	200K	200K	Closed
6	Gemma 3 27B (free)	Google DeepMind	$0.00	$0.00	131K	131K	OSS
7	Gemma 4 26B A4B (free)	Google DeepMind	$0.00	$0.00	262K	262K	OSS
8	Gemma 4 31B (free)	Google DeepMind	$0.00	$0.00	262K	262K	OSS
9	GLM 4.5 Air (free)	z-ai	$0.00	$0.00	131K	131K	OSS
10	gpt-oss-120b (free)	OpenAI	$0.00	$0.00	131K	131K	OSS
11	gpt-oss-20b (free)	OpenAI	$0.00	$0.00	131K	131K	OSS
12	Hermes 3 405B Instruct (free)	nousresearch	$0.00	$0.00	131K	131K	OSS
13	Hy3 preview (free)	tencent	$0.00	$0.00	262K	262K	Closed
14	Llama 3.2 3B Instruct (free)	Meta	$0.00	$0.00	131K	131K	OSS
15	Llama Guard 4 12B (free)	Meta	$0.00	$0.00	164K	164K	Closed
16	Lyria 3 Clip Preview	Google DeepMind	$0.00	$0.00	1.0M	1.0M	Closed
17	Lyria 3 Pro Preview	Google DeepMind	$0.00	$0.00	1.0M	1.0M	Closed
18	MiniMax M2.5 (free)	minimax	$0.00	$0.00	197K	197K	OSS
19	Mistral Small 3.1 24B (free)	Mistral AI	$0.00	$0.00	128K	128K	OSS
20	Nemotron 3 Nano 30B A3B (free)	NVIDIA	$0.00	$0.00	256K	256K	OSS
21	Nemotron 3 Nano Omni (free)	NVIDIA	$0.00	$0.00	256K	256K	Closed
22	Nemotron 3 Super (free)	NVIDIA	$0.00	$0.00	262K	262K	OSS
23	Nemotron Nano 12B 2 VL (free)	NVIDIA	$0.00	$0.00	128K	128K	OSS
24	Nemotron Nano 9B V2 (free)	NVIDIA	$0.00	$0.00	128K	128K	OSS
25	Owl Alpha	openrouter	$0.00	$0.00	1.0M	1.0M	Closed
26	Qwen3 Coder 480B A35B (free)	Alibaba Qwen	$0.00	$0.00	262K	262K	OSS
27	Qwen3 Next 80B A3B Instruct (free)	Alibaba Qwen	$0.00	$0.00	262K	262K	OSS
28	Qwen3.6 Plus (free)	Alibaba Qwen	$0.00	$0.00	1.0M	1.0M	Closed
29	Qwen3.6 Plus Preview (free)	Alibaba Qwen	$0.00	$0.00	1.0M	1.0M	OSS
30	Step 3.5 Flash (free)	stepfun	$0.00	$0.00	256K	256K	OSS
31	Trinity Large Preview (free)	arcee-ai	$0.00	$0.00	131K	131K	OSS
32	Trinity Mini (free)	arcee-ai	$0.00	$0.00	131K	131K	OSS
33	Granite 4.0 Micro	ibm-granite	$0.02	$0.11	131K	131K	OSS
34	Mistral Nemo	Mistral AI	$0.02	$0.03	131K	131K	OSS
35	gpt-oss-20b	OpenAI	$0.03	$0.14	131K	131K	OSS
36	Qwen-Turbo	Alibaba Qwen	$0.03	$0.13	131K	131K	OSS
37	Nova Micro 1.0	Amazon	$0.04	$0.14	128K	128K	Closed
38	Command R7B (12-2024)	Cohere	$0.04	$0.15	128K	128K	Closed
39	gpt-oss-120b	OpenAI	$0.04	$0.18	131K	131K	OSS
40	Gemma 3 12B	Google DeepMind	$0.04	$0.13	131K	131K	OSS

Top 3 cheapest 128K context LLMs

Auto Router offers 2.0M context at $-1000000.00/M input. Baseline modern model.

Body Builder (beta) offers 128K context at $-1000000.00/M input. Baseline modern model.

Pareto Code Router offers 200K context at $-1000000.00/M input. Baseline modern model.

The price gap · cheapest vs most expensive

Cheapest

Auto Router

$-1000000.00/M

$ per 1M input tokens

Why the gap

At 128K, premium pricing pays for reasoning quality and vendor reliability, not window size. For RAG backbones, the cheap end almost always wins.

Most expensive

Gemma 3 12B

$0.04/M

$ per 1M input tokens

Frequently asked questions

For most production use cases, yes. 128K fits 300+ pages, a full API spec, or a large function library. Only reach for 200K+ when you regularly exceed 100K token prompts.

Cheapest 128K context LLMs

Ranked by input price

Top 3 cheapest 128K context LLMs

The price gap · cheapest vs most expensive

Frequently asked questions

See also

Other context tiers

Stacks

Compare