Context · 128K+

Cheapest 128K context LLMs

Every LLM with a 128,000+ token context window. Ranked by input price per 1M tokens.

Models40
Cheapest$-1000000.00
Min context128K tokens
What this page is
128K is the modern baseline for LLM context. Every model priced per 1M tokens at 128K+ is listed here, cheapest first. For RAG pipelines, long conversations, and mid-sized documents, this is the sweet spot.

128K+ context models, cheapest first.

#ModelIn $/1MOut $/1MType
1openrouter logoAuto Router$-1000000.00$-1000000.00Closed
2openrouter logoBody Builder (beta)$-1000000.00$-1000000.00Closed
3openrouter logoPareto Code Router$-1000000.00$-1000000.00Closed
4openrouter logoElephant$0.00$0.00Closed
5openrouter logoFree Models Router$0.00$0.00Closed
6Google DeepMind logoGemma 3 27B (free)$0.00$0.00OSS
7Google DeepMind logoGemma 4 26B A4B (free)$0.00$0.00OSS
8Google DeepMind logoGemma 4 31B (free)$0.00$0.00OSS
9z-ai logoGLM 4.5 Air (free)$0.00$0.00OSS
10OpenAI logogpt-oss-120b (free)$0.00$0.00OSS
11OpenAI logogpt-oss-20b (free)$0.00$0.00OSS
12nousresearch logoHermes 3 405B Instruct (free)$0.00$0.00OSS
13tencent logoHy3 preview (free)$0.00$0.00Closed
14Meta logoLlama 3.2 3B Instruct (free)$0.00$0.00OSS
15Meta logoLlama Guard 4 12B (free)$0.00$0.00Closed
16Google DeepMind logoLyria 3 Clip Preview$0.00$0.00Closed
17Google DeepMind logoLyria 3 Pro Preview$0.00$0.00Closed
18minimax logoMiniMax M2.5 (free)$0.00$0.00OSS
19Mistral AI logoMistral Small 3.1 24B (free)$0.00$0.00OSS
20NVIDIA logoNemotron 3 Nano 30B A3B (free)$0.00$0.00OSS
21NVIDIA logoNemotron 3 Nano Omni (free)$0.00$0.00Closed
22NVIDIA logoNemotron 3 Super (free)$0.00$0.00OSS
23NVIDIA logoNemotron Nano 12B 2 VL (free)$0.00$0.00OSS
24NVIDIA logoNemotron Nano 9B V2 (free)$0.00$0.00OSS
25openrouter logoOwl Alpha$0.00$0.00Closed
26Alibaba Qwen logoQwen3 Coder 480B A35B (free)$0.00$0.00OSS
27Alibaba Qwen logoQwen3 Next 80B A3B Instruct (free)$0.00$0.00OSS
28Alibaba Qwen logoQwen3.6 Plus (free)$0.00$0.00Closed
29Alibaba Qwen logoQwen3.6 Plus Preview (free)$0.00$0.00OSS
30stepfun logoStep 3.5 Flash (free)$0.00$0.00OSS
31arcee-ai logoTrinity Large Preview (free)$0.00$0.00OSS
32arcee-ai logoTrinity Mini (free)$0.00$0.00OSS
33ibm-granite logoGranite 4.0 Micro$0.02$0.11OSS
34Mistral AI logoMistral Nemo$0.02$0.03OSS
35OpenAI logogpt-oss-20b$0.03$0.14OSS
36Alibaba Qwen logoQwen-Turbo$0.03$0.13OSS
37Amazon logoNova Micro 1.0$0.04$0.14Closed
38Cohere logoCommand R7B (12-2024)$0.04$0.15Closed
39OpenAI logogpt-oss-120b$0.04$0.18OSS
40Google DeepMind logoGemma 3 12B$0.04$0.13OSS
Cheapest
Auto Router
$-1000000.00/M
$ per 1M input tokens
Why the gap

At 128K, premium pricing pays for reasoning quality and vendor reliability, not window size. For RAG backbones, the cheap end almost always wins.

Most expensive
Gemma 3 12B
$0.04/M
$ per 1M input tokens
For most production use cases, yes. 128K fits 300+ pages, a full API spec, or a large function library. Only reach for 200K+ when you regularly exceed 100K token prompts.