Do reasoning tokens count?

Yes. Hidden chain-of-thought tokens are billed as output on all major reasoning APIs.

How do I reduce output cost?

Shorter responses (max_tokens), structured output formats, route easy queries to cheap models, use speculative-decoding-friendly providers.

PricingReading · ~3 min · 68 words deep

Output Tokens

The tokens the model writes back to you · priced 3-5× higher than input because decoding is sequential.

TL;DR

The tokens the model writes back to you · priced 3-5× higher than input because decoding is sequential.

Level 1

Basic

Every AI API bills output tokens separately. GPT-5: $60/M out. Claude 4.5 Opus: $75/M out. Reasoning models charge output tokens for hidden reasoning too, so a single "answer" can burn 10,000+ billed tokens. Cheap tier: DeepSeek V3 at $1.10/M out.

Level 2

Deep

Output tokens are expensive because they're generated one at a time during the decode phase. Each token requires a full pass through the model reading the KV cache · memory bandwidth bound. This is why faster GPUs and newer memory generations matter so much for inference economics. Speculative decoding yields 2-3× throughput on output tokens. FP8 and INT8 quantization at both weights and activations doubles output throughput again.

Level 3

Expert

Output cost = model size × sequential token count × cluster efficiency. The memory-bandwidth constraint on decode makes each generated token expensive relative to each input token. Reasoning models amplify this: o3 can burn 50,000+ reasoning tokens on a single hard problem at $60/M = $3 per answer. Optimization via prompt engineering (shorter replies, structured outputs) is often the highest-leverage cost control in production.

The takeaway for you

Depending on why you're here

If you are a

Researcher

·Memory-bandwidth-bound · sequential decoding
·Speculative decoding + FP8 are the main throughput optimizations
·Reasoning models 10-100× output volume per answer

If you are a

Builder

·Use structured outputs to cap response length
·Reasoning models are expensive · route easy queries to cheap models
·Max tokens parameter is your budget safety valve

If you are a

Investor

·Output margin is where providers recoup input subsidies
·Reasoning tier is a new premium output category
·Cheap tier competition (DeepSeek, Mistral) compresses output margin

If you are a

Curious · Normie

·What the AI writes back · the expensive part
·Why asking for long responses costs more
·Reasoning AI pays for its "thinking" too

Gecko's take

Output token economics drove every major model architecture decision since 2024. MoE, reasoning tiers, speculative decoding · all optimizing one thing.

The price of knowing this term

A chat app at 10M output tokens/month on GPT-5 ($60/M) = $600. Same volume on DeepSeek V3 ($1.10/M) = $11. The right model choice is 50× price differences.

Frequently Asked Questions

Sequential generation is memory-bandwidth-bound. Each token requires reading the KV cache fully. Input is parallel prefill, much more efficient.

Output Tokens

Basic

Deep

Expert

Depending on why you're here

Frequently Asked Questions

Related terms

Glossary

Explore live data

Cite or embed