Beta
PricingReading · ~3 min · 60 words deep

Reasoning Token Billing

Reasoning models (o-series, Claude extended thinking, DeepSeek R1) bill hidden thinking tokens at the output rate · total cost can be 2-10× a non-reasoning query.

TL;DR

Reasoning models (o-series, Claude extended thinking, DeepSeek R1) bill hidden thinking tokens at the output rate · total cost can be 2-10× a non-reasoning query.

Level 1

OpenAI o1/o3/o4 emit extensive chain-of-thought before the final answer · those "reasoning tokens" are hidden from the user but billed at the output rate. A reasoning query that returns a 200-token answer might consume 3,000 reasoning tokens, making the effective cost 16× the visible answer. Anthropic Claude's extended thinking mode works similarly. DeepSeek R1 made thinking tokens visible · you can audit them.

Level 2

Reasoning token pricing forces app builders to cap reasoning budgets. OpenAI added a `max_output_tokens` parameter that hard-caps total tokens; most APIs added `reasoning_effort` (low/medium/high) to tune budget. Typical reasoning-to-answer ratios: math-heavy tasks 20:1, simple Q&A 2:1. For high-traffic apps, rolling reasoning into an initial cheap pass and falling back to a reasoning model on hard cases is now standard practice.

Level 3

Reasoning token opacity is controversial · OpenAI initially hid both the count and the content; community pressure forced release of token counts (not content) for o1 in early 2025. Transparency pricing: DeepSeek R1 ships visible reasoning; Anthropic shows budgets + counts; OpenAI still hides content. Pricing traps: some providers (non-OpenAI) charge reasoning tokens at input rate, making them effectively free; OpenAI and Anthropic charge at output rate. Always check the doc.

Why this matters now

April 2026 · every major provider ships a reasoning model. Reasoning token billing is the #1 finops surprise for teams adopting them.

The takeaway for you
If you are a
Researcher
  • ·Hidden thinking tokens billed at output rate
  • ·Typical ratio: 2-20× visible output
  • ·OpenAI hides content · DeepSeek shows it
If you are a
Builder
  • ·Cap with reasoning_effort or max_output_tokens
  • ·Use reasoning models only when needed · route cheaper queries elsewhere
  • ·Test per-task ratio · it varies wildly by task type
If you are a
Investor
  • ·Reasoning models have the fattest margins · most tokens billed
  • ·Catalyst for specialized reasoning providers (DeepSeek)
  • ·Price competition on reasoning could cut margins fast
If you are a
Curious · Normie
  • ·AI "thinking" costs money even when you don't see it
  • ·Smart models that "think" are more expensive per answer
  • ·Not billed to you personally · built into your app's AI bill
Gecko's take

Reasoning token billing is the finops gotcha of 2026. Cap your reasoning_effort before deploying.

Count yes, content no. The API returns `reasoning_tokens` in usage but the actual reasoning content is hidden by policy.