Beta
Context · 128K+

Cheapest 128K context LLMs

Every LLM with a 128,000+ token context window. Ranked by input price per 1M tokens.

Models40
Cheapest$-1000000.00
Min context128K tokens
What this page is
128K is the modern baseline for LLM context. Every model priced per 1M tokens at 128K+ is listed here, cheapest first. For RAG pipelines, long conversations, and mid-sized documents, this is the sweet spot.

128K+ context models, cheapest first.

#ModelIn $/1MOut $/1MType
1openrouter logoAuto Router$-1000000.00$-1000000.00Closed
2openrouter logoBody Builder (beta)$-1000000.00$-1000000.00Closed
3openrouter logoElephant$0.00$0.00Closed
4openrouter logoFree Models Router$0.00$0.00Closed
5Google DeepMind logoGemma 3 27B (free)$0.00$0.00OSS
6Google DeepMind logoGemma 4 26B A4B (free)$0.00$0.00OSS
7Google DeepMind logoGemma 4 31B (free)$0.00$0.00OSS
8z-ai logoGLM 4.5 Air (free)$0.00$0.00OSS
9OpenAI logogpt-oss-120b (free)$0.00$0.00OSS
10OpenAI logogpt-oss-20b (free)$0.00$0.00OSS
11nousresearch logoHermes 3 405B Instruct (free)$0.00$0.00OSS
12Meta logoLlama 3.2 3B Instruct (free)$0.00$0.00OSS
13Google DeepMind logoLyria 3 Clip Preview$0.00$0.00Closed
14Google DeepMind logoLyria 3 Pro Preview$0.00$0.00Closed
15minimax logoMiniMax M2.5 (free)$0.00$0.00OSS
16Mistral AI logoMistral Small 3.1 24B (free)$0.00$0.00OSS
17NVIDIA logoNemotron 3 Nano 30B A3B (free)$0.00$0.00OSS
18NVIDIA logoNemotron 3 Super (free)$0.00$0.00OSS
19NVIDIA logoNemotron Nano 12B 2 VL (free)$0.00$0.00OSS
20NVIDIA logoNemotron Nano 9B V2 (free)$0.00$0.00OSS
21Alibaba Qwen logoQwen3 Coder 480B A35B (free)$0.00$0.00OSS
22Alibaba Qwen logoQwen3 Next 80B A3B Instruct (free)$0.00$0.00OSS
23Alibaba Qwen logoQwen3.6 Plus (free)$0.00$0.00Closed
24Alibaba Qwen logoQwen3.6 Plus Preview (free)$0.00$0.00OSS
25stepfun logoStep 3.5 Flash (free)$0.00$0.00OSS
26arcee-ai logoTrinity Large Preview (free)$0.00$0.00OSS
27arcee-ai logoTrinity Mini (free)$0.00$0.00OSS
28ibm-granite logoGranite 4.0 Micro$0.02$0.11OSS
29Mistral AI logoMistral Nemo$0.02$0.04OSS
30OpenAI logogpt-oss-20b$0.03$0.14OSS
31Alibaba Qwen logoQwen-Turbo$0.03$0.13OSS
32Amazon logoNova Micro 1.0$0.04$0.14Closed
33Cohere logoCommand R7B (12-2024)$0.04$0.15Closed
34OpenAI logogpt-oss-120b$0.04$0.19OSS
35Google DeepMind logoGemma 3 12B$0.04$0.13OSS
36Google DeepMind logoGemma 3 4B$0.04$0.08OSS
37NVIDIA logoNemotron Nano 9B V2$0.04$0.16OSS
38arcee-ai logoTrinity Mini$0.04$0.15OSS
39OpenAI logoGPT-5 Nano$0.05$0.40Closed
40NVIDIA logoNemotron 3 Nano 30B A3B$0.05$0.20OSS
Cheapest
Auto Router
$-1000000.00/M
$ per 1M input tokens
Why the gap

At 128K, premium pricing pays for reasoning quality and vendor reliability, not window size. For RAG backbones, the cheap end almost always wins.

Most expensive
GPT-5 Nano
$0.05/M
$ per 1M input tokens
For most production use cases, yes. 128K fits 300+ pages, a full API spec, or a large function library. Only reach for 200K+ when you regularly exceed 100K token prompts.