Input Tokens
The tokens in your prompt · billed per million, typically 3-5× cheaper than output tokens.
The tokens in your prompt · billed per million, typically 3-5× cheaper than output tokens.
Basic
When you call an LLM API, everything you send (system prompt + user message + retrieved docs + conversation history) counts as input tokens. Providers charge per million input tokens. GPT-5: $10/M in. Claude 4.5 Opus: $15/M in. DeepSeek V3: $0.27/M in. Cheapest tier: Llama 4 self-hosted free.
Deep
Input token cost scales with prompt length. Long system prompts + full retrieval + long conversation history = expensive. Optimizations: prompt compression, prompt caching (Claude 90% off repeat prefix, OpenAI 50% off cached input), and RAG with smaller context windows. Most production apps land at 70-85% of total token cost on input despite paying more per output token, because inputs are typically 5-20× longer than outputs.
Expert
Input tokens are processed in parallel during the prefill phase · compute-bound, not memory-bound. This is why input is cheap per token relative to output · you can batch prefill work efficiently across many requests. Cost accounting: the cheapest way to run long-context workloads is aggressive prompt caching + retrieval-pruned context. Cache-aware architectures (stable system prompt + volatile user context) let you pay 10× less for the same effective context.
Depending on why you're here
- ·Processed in parallel during prefill · compute-bound
- ·Cacheable across turns · stable prefix = 90% discount on Claude
- ·5-20× longer than outputs in typical workloads
- ·Cache your system prompt if it's stable · 90% off on Claude
- ·Keep prompts tight · long context bills you per call
- ·Cost most wars are won or lost on input token discipline
- ·Input margin is structurally thin · providers recoup on output
- ·Cache infrastructure is a margin lever · Claude's 90% cache discount is aggressive
- ·Prompt caching can reshape competitive pricing positioning
- ·The stuff you send to the AI · shorter is cheaper
- ·Usually cheaper than what the AI sends back
- ·Why long conversations get expensive over time
Input token discipline separates teams that can scale from teams that can't. Cache aggressively.
A 500K-token/month workload at $10/M input = $5/month. At 50M tokens/month it's $500. Understanding this unit cost is the whole AI budgeting game.