Beta
PricingReading · ~3 min · 68 words deep

Output Tokens

The tokens the model writes back to you · priced 3-5× higher than input because decoding is sequential.

TL;DR

The tokens the model writes back to you · priced 3-5× higher than input because decoding is sequential.

Level 1

Every AI API bills output tokens separately. GPT-5: $60/M out. Claude 4.5 Opus: $75/M out. Reasoning models charge output tokens for hidden reasoning too, so a single "answer" can burn 10,000+ billed tokens. Cheap tier: DeepSeek V3 at $1.10/M out.

Level 2

Output tokens are expensive because they're generated one at a time during the decode phase. Each token requires a full pass through the model reading the KV cache · memory bandwidth bound. This is why faster GPUs and newer memory generations matter so much for inference economics. Speculative decoding yields 2-3× throughput on output tokens. FP8 and INT8 quantization at both weights and activations doubles output throughput again.

Level 3

Output cost = model size × sequential token count × cluster efficiency. The memory-bandwidth constraint on decode makes each generated token expensive relative to each input token. Reasoning models amplify this: o3 can burn 50,000+ reasoning tokens on a single hard problem at $60/M = $3 per answer. Optimization via prompt engineering (shorter replies, structured outputs) is often the highest-leverage cost control in production.

The takeaway for you
If you are a
Researcher
  • ·Memory-bandwidth-bound · sequential decoding
  • ·Speculative decoding + FP8 are the main throughput optimizations
  • ·Reasoning models 10-100× output volume per answer
If you are a
Builder
  • ·Use structured outputs to cap response length
  • ·Reasoning models are expensive · route easy queries to cheap models
  • ·Max tokens parameter is your budget safety valve
If you are a
Investor
  • ·Output margin is where providers recoup input subsidies
  • ·Reasoning tier is a new premium output category
  • ·Cheap tier competition (DeepSeek, Mistral) compresses output margin
If you are a
Curious · Normie
  • ·What the AI writes back · the expensive part
  • ·Why asking for long responses costs more
  • ·Reasoning AI pays for its "thinking" too
Gecko's take

Output token economics drove every major model architecture decision since 2024. MoE, reasoning tiers, speculative decoding · all optimizing one thing.

The price of knowing this term

A chat app at 10M output tokens/month on GPT-5 ($60/M) = $600. Same volume on DeepSeek V3 ($1.10/M) = $11. The right model choice is 50× price differences.

Sequential generation is memory-bandwidth-bound. Each token requires reading the KV cache fully. Input is parallel prefill, much more efficient.