Is input cheaper than output?

Yes, usually 3-5× cheaper per token. Input is parallel prefill; output is sequential decode.

Does the system prompt count as input tokens?

Yes. Every token of system prompt + user message + tool definitions + history counts as input.

PricingReading · ~3 min · 66 words deep

Input Tokens

The tokens in your prompt · billed per million, typically 3-5× cheaper than output tokens.

TL;DR

The tokens in your prompt · billed per million, typically 3-5× cheaper than output tokens.

Level 1

Basic

When you call an LLM API, everything you send (system prompt + user message + retrieved docs + conversation history) counts as input tokens. Providers charge per million input tokens. GPT-5: $10/M in. Claude 4.5 Opus: $15/M in. DeepSeek V3: $0.27/M in. Cheapest tier: Llama 4 self-hosted free.

Level 2

Deep

Input token cost scales with prompt length. Long system prompts + full retrieval + long conversation history = expensive. Optimizations: prompt compression, prompt caching (Claude 90% off repeat prefix, OpenAI 50% off cached input), and RAG with smaller context windows. Most production apps land at 70-85% of total token cost on input despite paying more per output token, because inputs are typically 5-20× longer than outputs.

Level 3

Expert

Input tokens are processed in parallel during the prefill phase · compute-bound, not memory-bound. This is why input is cheap per token relative to output · you can batch prefill work efficiently across many requests. Cost accounting: the cheapest way to run long-context workloads is aggressive prompt caching + retrieval-pruned context. Cache-aware architectures (stable system prompt + volatile user context) let you pay 10× less for the same effective context.

The takeaway for you

Depending on why you're here

If you are a

Researcher

·Processed in parallel during prefill · compute-bound
·Cacheable across turns · stable prefix = 90% discount on Claude
·5-20× longer than outputs in typical workloads

If you are a

Builder

·Cache your system prompt if it's stable · 90% off on Claude
·Keep prompts tight · long context bills you per call
·Cost most wars are won or lost on input token discipline

If you are a

Investor

·Input margin is structurally thin · providers recoup on output
·Cache infrastructure is a margin lever · Claude's 90% cache discount is aggressive
·Prompt caching can reshape competitive pricing positioning

If you are a

Curious · Normie

·The stuff you send to the AI · shorter is cheaper
·Usually cheaper than what the AI sends back
·Why long conversations get expensive over time

Gecko's take

Input token discipline separates teams that can scale from teams that can't. Cache aggressively.

The price of knowing this term

A 500K-token/month workload at $10/M input = $5/month. At 50M tokens/month it's $500. Understanding this unit cost is the whole AI budgeting game.

Frequently Asked Questions

Prompt caching (Claude, OpenAI), retrieval-pruned RAG context, shorter system prompts, tighter few-shot examples.

Input Tokens

Basic

Deep

Expert

Depending on why you're here

Frequently Asked Questions

Related terms

Glossary

Explore live data

Cite or embed