Why pick a 32K model over a 128K one?

Often no reason, since 128K models usually cost the same per token. But some legacy cheap models are 32K only. If they meet your quality bar, go ahead.

How many words fit in 32K tokens?

Roughly 24,000 English words or 45 pages of plain text. That is a long blog post, a mid-sized PDF, or several hours of chat history.

Does 32K save memory at inference time?

Yes, if self-hosting. A 32K context model uses a fraction of the KV cache memory a 128K model needs. For GPU-constrained deployments, pick smaller windows and chunk smartly.

Is 32K context still relevant in 2026?

Less so for new launches, but many production systems still use 32K models for chat, classification, and light RAG. They're cheap, fast, and good enough.

Context · 32K+

Cheapest 32K context LLMs

Every LLM with at least 32K token context. Ranked by input price per 1M tokens.

Models50

Cheapest$-1000000.00

Min context32K tokens

All pricing Pricing home

What this page is

32K context is the floor for modern LLM use. This tier captures the broadest set of priced models at one of the deepest discounts. Ideal for chat, short-doc RAG, classification, and any workload where you do not need a huge window.

Ranked by input price

32K+ context models, cheapest first.

#	Model	Provider	In $/1M	Out $/1M	Context	context	Type
1	Auto Router	openrouter	$-1000000.00	$-1000000.00	2.0M	2.0M	Closed
2	Body Builder (beta)	openrouter	$-1000000.00	$-1000000.00	128K	128K	Closed
3	Pareto Code Router	openrouter	$-1000000.00	$-1000000.00	200K	200K	Closed
4	Elephant	openrouter	$0.00	$0.00	262K	262K	Closed
5	Free Models Router	openrouter	$0.00	$0.00	200K	200K	Closed
6	Gemma 3 12B (free)	Google DeepMind	$0.00	$0.00	33K	33K	OSS
7	Gemma 3 27B (free)	Google DeepMind	$0.00	$0.00	131K	131K	OSS
8	Gemma 3 4B (free)	Google DeepMind	$0.00	$0.00	33K	33K	OSS
9	Gemma 4 26B A4B (free)	Google DeepMind	$0.00	$0.00	262K	262K	OSS
10	Gemma 4 31B (free)	Google DeepMind	$0.00	$0.00	262K	262K	OSS
11	GLM 4.5 Air (free)	z-ai	$0.00	$0.00	131K	131K	OSS
12	gpt-oss-120b (free)	OpenAI	$0.00	$0.00	131K	131K	OSS
13	gpt-oss-20b (free)	OpenAI	$0.00	$0.00	131K	131K	OSS
14	Hermes 3 405B Instruct (free)	nousresearch	$0.00	$0.00	131K	131K	OSS
15	Hy3 preview (free)	tencent	$0.00	$0.00	262K	262K	Closed
16	LFM2.5-1.2B-Instruct (free)	liquid	$0.00	$0.00	33K	33K	OSS
17	LFM2.5-1.2B-Thinking (free)	liquid	$0.00	$0.00	33K	33K	OSS
18	Llama 3.2 3B Instruct (free)	Meta	$0.00	$0.00	131K	131K	OSS
19	Llama 3.3 70B Instruct (free)	Meta	$0.00	$0.00	66K	66K	OSS
20	Llama Guard 4 12B (free)	Meta	$0.00	$0.00	164K	164K	Closed
21	Lyria 3 Clip Preview	Google DeepMind	$0.00	$0.00	1.0M	1.0M	Closed
22	Lyria 3 Pro Preview	Google DeepMind	$0.00	$0.00	1.0M	1.0M	Closed
23	MiniMax M2.5 (free)	minimax	$0.00	$0.00	197K	197K	OSS
24	Mistral Small 3.1 24B (free)	Mistral AI	$0.00	$0.00	128K	128K	OSS
25	Nemotron 3 Nano 30B A3B (free)	NVIDIA	$0.00	$0.00	256K	256K	OSS
26	Nemotron 3 Nano Omni (free)	NVIDIA	$0.00	$0.00	256K	256K	Closed
27	Nemotron 3 Super (free)	NVIDIA	$0.00	$0.00	262K	262K	OSS
28	Nemotron Nano 12B 2 VL (free)	NVIDIA	$0.00	$0.00	128K	128K	OSS
29	Nemotron Nano 9B V2 (free)	NVIDIA	$0.00	$0.00	128K	128K	OSS
30	Owl Alpha	openrouter	$0.00	$0.00	1.0M	1.0M	Closed
31	Qianfan-OCR-Fast (free)	baidu	$0.00	$0.00	66K	66K	Closed
32	Qwen3 4B (free)	Alibaba Qwen	$0.00	$0.00	41K	41K	OSS
33	Qwen3 Coder 480B A35B (free)	Alibaba Qwen	$0.00	$0.00	262K	262K	OSS
34	Qwen3 Next 80B A3B Instruct (free)	Alibaba Qwen	$0.00	$0.00	262K	262K	OSS
35	Qwen3.6 Plus (free)	Alibaba Qwen	$0.00	$0.00	1.0M	1.0M	Closed
36	Qwen3.6 Plus Preview (free)	Alibaba Qwen	$0.00	$0.00	1.0M	1.0M	OSS
37	Step 3.5 Flash (free)	stepfun	$0.00	$0.00	256K	256K	OSS
38	Trinity Large Preview (free)	arcee-ai	$0.00	$0.00	131K	131K	OSS
39	Trinity Mini (free)	arcee-ai	$0.00	$0.00	131K	131K	OSS
40	Uncensored (free)	cognitivecomputations	$0.00	$0.00	33K	33K	OSS
41	LFM2-2.6B	liquid	$0.01	$0.02	33K	33K	OSS
42	LFM2-8B-A1B	liquid	$0.01	$0.02	33K	33K	OSS
43	Granite 4.0 Micro	ibm-granite	$0.02	$0.11	131K	131K	OSS
44	Mistral Nemo	Mistral AI	$0.02	$0.03	131K	131K	OSS
45	Llama 3.2 1B Instruct	Meta	$0.03	$0.20	60K	60K	OSS
46	gpt-oss-20b	OpenAI	$0.03	$0.14	131K	131K	OSS
47	LFM2-24B-A2B	liquid	$0.03	$0.12	33K	33K	OSS
48	Qwen2.5 Coder 7B Instruct	Alibaba Qwen	$0.03	$0.09	33K	33K	OSS
49	Qwen-Turbo	Alibaba Qwen	$0.03	$0.13	131K	131K	OSS
50	Nova Micro 1.0	Amazon	$0.04	$0.14	128K	128K	Closed

Top 3 cheapest 32K+ context LLMs

Auto Router at $-1000000.00/M input with 2.0M context · fine for most chat and RAG loads.

Body Builder (beta) at $-1000000.00/M input with 128K context · fine for most chat and RAG loads.

Pareto Code Router at $-1000000.00/M input with 200K context · fine for most chat and RAG loads.

The price gap · cheapest vs most expensive

Cheapest

Auto Router

$-1000000.00/M

$ per 1M input tokens

Why the gap

At this tier, price reflects raw model quality more than context size. The cheapest 32K model is often a small open-source model; the most expensive is usually a frontier model deliberately billed the same across windows.

Most expensive

Nova Micro 1.0

$0.04/M

$ per 1M input tokens

Frequently asked questions

For most applications, yes. Chat histories, RAG retrievals, and single-document QA rarely exceed 16K tokens. 32K leaves plenty of headroom.

Cheapest 32K context LLMs

Ranked by input price

Top 3 cheapest 32K+ context LLMs

The price gap · cheapest vs most expensive

Frequently asked questions

See also

Other context tiers

Stacks

Compare