Beta
PricingReading · ~3 min · 66 words deep

Input Tokens

The tokens in your prompt · billed per million, typically 3-5× cheaper than output tokens.

TL;DR

The tokens in your prompt · billed per million, typically 3-5× cheaper than output tokens.

Level 1

When you call an LLM API, everything you send (system prompt + user message + retrieved docs + conversation history) counts as input tokens. Providers charge per million input tokens. GPT-5: $10/M in. Claude 4.5 Opus: $15/M in. DeepSeek V3: $0.27/M in. Cheapest tier: Llama 4 self-hosted free.

Level 2

Input token cost scales with prompt length. Long system prompts + full retrieval + long conversation history = expensive. Optimizations: prompt compression, prompt caching (Claude 90% off repeat prefix, OpenAI 50% off cached input), and RAG with smaller context windows. Most production apps land at 70-85% of total token cost on input despite paying more per output token, because inputs are typically 5-20× longer than outputs.

Level 3

Input tokens are processed in parallel during the prefill phase · compute-bound, not memory-bound. This is why input is cheap per token relative to output · you can batch prefill work efficiently across many requests. Cost accounting: the cheapest way to run long-context workloads is aggressive prompt caching + retrieval-pruned context. Cache-aware architectures (stable system prompt + volatile user context) let you pay 10× less for the same effective context.

The takeaway for you
If you are a
Researcher
  • ·Processed in parallel during prefill · compute-bound
  • ·Cacheable across turns · stable prefix = 90% discount on Claude
  • ·5-20× longer than outputs in typical workloads
If you are a
Builder
  • ·Cache your system prompt if it's stable · 90% off on Claude
  • ·Keep prompts tight · long context bills you per call
  • ·Cost most wars are won or lost on input token discipline
If you are a
Investor
  • ·Input margin is structurally thin · providers recoup on output
  • ·Cache infrastructure is a margin lever · Claude's 90% cache discount is aggressive
  • ·Prompt caching can reshape competitive pricing positioning
If you are a
Curious · Normie
  • ·The stuff you send to the AI · shorter is cheaper
  • ·Usually cheaper than what the AI sends back
  • ·Why long conversations get expensive over time
Gecko's take

Input token discipline separates teams that can scale from teams that can't. Cache aggressively.

The price of knowing this term

A 500K-token/month workload at $10/M input = $5/month. At 50M tokens/month it's $500. Understanding this unit cost is the whole AI budgeting game.

Prompt caching (Claude, OpenAI), retrieval-pruned RAG context, shorter system prompts, tighter few-shot examples.