How do I cap reasoning cost?

Use `reasoning_effort: low/medium/high` (OpenAI/Claude) or hard-cap via `max_output_tokens`.

Are reasoning tokens cheaper than output?

Not on OpenAI or Anthropic · same rate as output. Some open-source providers bill them at input rate (cheaper).

PricingReading · ~3 min · 60 words deep

Reasoning Token Billing

Reasoning models (o-series, Claude extended thinking, DeepSeek R1) bill hidden thinking tokens at the output rate · total cost can be 2-10× a non-reasoning query.

TL;DR

Reasoning models (o-series, Claude extended thinking, DeepSeek R1) bill hidden thinking tokens at the output rate · total cost can be 2-10× a non-reasoning query.

Level 1

Basic

OpenAI o1/o3/o4 emit extensive chain-of-thought before the final answer · those "reasoning tokens" are hidden from the user but billed at the output rate. A reasoning query that returns a 200-token answer might consume 3,000 reasoning tokens, making the effective cost 16× the visible answer. Anthropic Claude's extended thinking mode works similarly. DeepSeek R1 made thinking tokens visible · you can audit them.

Level 2

Deep

Reasoning token pricing forces app builders to cap reasoning budgets. OpenAI added a `max_output_tokens` parameter that hard-caps total tokens; most APIs added `reasoning_effort` (low/medium/high) to tune budget. Typical reasoning-to-answer ratios: math-heavy tasks 20:1, simple Q&A 2:1. For high-traffic apps, rolling reasoning into an initial cheap pass and falling back to a reasoning model on hard cases is now standard practice.

Level 3

Expert

Reasoning token opacity is controversial · OpenAI initially hid both the count and the content; community pressure forced release of token counts (not content) for o1 in early 2025. Transparency pricing: DeepSeek R1 ships visible reasoning; Anthropic shows budgets + counts; OpenAI still hides content. Pricing traps: some providers (non-OpenAI) charge reasoning tokens at input rate, making them effectively free; OpenAI and Anthropic charge at output rate. Always check the doc.

Why this matters now

April 2026 · every major provider ships a reasoning model. Reasoning token billing is the #1 finops surprise for teams adopting them.

The takeaway for you

Depending on why you're here

If you are a

Researcher

·Hidden thinking tokens billed at output rate
·Typical ratio: 2-20× visible output
·OpenAI hides content · DeepSeek shows it

If you are a

Builder

·Cap with reasoning_effort or max_output_tokens
·Use reasoning models only when needed · route cheaper queries elsewhere
·Test per-task ratio · it varies wildly by task type

If you are a

Investor

·Reasoning models have the fattest margins · most tokens billed
·Catalyst for specialized reasoning providers (DeepSeek)
·Price competition on reasoning could cut margins fast

If you are a

Curious · Normie

·AI "thinking" costs money even when you don't see it
·Smart models that "think" are more expensive per answer
·Not billed to you personally · built into your app's AI bill

Gecko's take

Reasoning token billing is the finops gotcha of 2026. Cap your reasoning_effort before deploying.

Frequently Asked Questions

Count yes, content no. The API returns `reasoning_tokens` in usage but the actual reasoning content is hidden by policy.

Reasoning Token Billing

Basic

Deep

Expert

Depending on why you're here

Frequently Asked Questions

Related terms

Glossary

Explore live data

Cite or embed