Reasoning Token Billing
Reasoning models (o-series, Claude extended thinking, DeepSeek R1) bill hidden thinking tokens at the output rate · total cost can be 2-10× a non-reasoning query.
Reasoning models (o-series, Claude extended thinking, DeepSeek R1) bill hidden thinking tokens at the output rate · total cost can be 2-10× a non-reasoning query.
Basic
OpenAI o1/o3/o4 emit extensive chain-of-thought before the final answer · those "reasoning tokens" are hidden from the user but billed at the output rate. A reasoning query that returns a 200-token answer might consume 3,000 reasoning tokens, making the effective cost 16× the visible answer. Anthropic Claude's extended thinking mode works similarly. DeepSeek R1 made thinking tokens visible · you can audit them.
Deep
Reasoning token pricing forces app builders to cap reasoning budgets. OpenAI added a `max_output_tokens` parameter that hard-caps total tokens; most APIs added `reasoning_effort` (low/medium/high) to tune budget. Typical reasoning-to-answer ratios: math-heavy tasks 20:1, simple Q&A 2:1. For high-traffic apps, rolling reasoning into an initial cheap pass and falling back to a reasoning model on hard cases is now standard practice.
Expert
Reasoning token opacity is controversial · OpenAI initially hid both the count and the content; community pressure forced release of token counts (not content) for o1 in early 2025. Transparency pricing: DeepSeek R1 ships visible reasoning; Anthropic shows budgets + counts; OpenAI still hides content. Pricing traps: some providers (non-OpenAI) charge reasoning tokens at input rate, making them effectively free; OpenAI and Anthropic charge at output rate. Always check the doc.
April 2026 · every major provider ships a reasoning model. Reasoning token billing is the #1 finops surprise for teams adopting them.
Depending on why you're here
- ·Hidden thinking tokens billed at output rate
- ·Typical ratio: 2-20× visible output
- ·OpenAI hides content · DeepSeek shows it
- ·Cap with reasoning_effort or max_output_tokens
- ·Use reasoning models only when needed · route cheaper queries elsewhere
- ·Test per-task ratio · it varies wildly by task type
- ·Reasoning models have the fattest margins · most tokens billed
- ·Catalyst for specialized reasoning providers (DeepSeek)
- ·Price competition on reasoning could cut margins fast
- ·AI "thinking" costs money even when you don't see it
- ·Smart models that "think" are more expensive per answer
- ·Not billed to you personally · built into your app's AI bill
Reasoning token billing is the finops gotcha of 2026. Cap your reasoning_effort before deploying.