Output Tokens
The tokens the model writes back to you · priced 3-5× higher than input because decoding is sequential.
The tokens the model writes back to you · priced 3-5× higher than input because decoding is sequential.
Basic
Every AI API bills output tokens separately. GPT-5: $60/M out. Claude 4.5 Opus: $75/M out. Reasoning models charge output tokens for hidden reasoning too, so a single "answer" can burn 10,000+ billed tokens. Cheap tier: DeepSeek V3 at $1.10/M out.
Deep
Output tokens are expensive because they're generated one at a time during the decode phase. Each token requires a full pass through the model reading the KV cache · memory bandwidth bound. This is why faster GPUs and newer memory generations matter so much for inference economics. Speculative decoding yields 2-3× throughput on output tokens. FP8 and INT8 quantization at both weights and activations doubles output throughput again.
Expert
Output cost = model size × sequential token count × cluster efficiency. The memory-bandwidth constraint on decode makes each generated token expensive relative to each input token. Reasoning models amplify this: o3 can burn 50,000+ reasoning tokens on a single hard problem at $60/M = $3 per answer. Optimization via prompt engineering (shorter replies, structured outputs) is often the highest-leverage cost control in production.
Depending on why you're here
- ·Memory-bandwidth-bound · sequential decoding
- ·Speculative decoding + FP8 are the main throughput optimizations
- ·Reasoning models 10-100× output volume per answer
- ·Use structured outputs to cap response length
- ·Reasoning models are expensive · route easy queries to cheap models
- ·Max tokens parameter is your budget safety valve
- ·Output margin is where providers recoup input subsidies
- ·Reasoning tier is a new premium output category
- ·Cheap tier competition (DeepSeek, Mistral) compresses output margin
- ·What the AI writes back · the expensive part
- ·Why asking for long responses costs more
- ·Reasoning AI pays for its "thinking" too
Output token economics drove every major model architecture decision since 2024. MoE, reasoning tiers, speculative decoding · all optimizing one thing.
A chat app at 10M output tokens/month on GPT-5 ($60/M) = $600. Same volume on DeepSeek V3 ($1.10/M) = $11. The right model choice is 50× price differences.