Reserved vs on-demand?

Reserved wins above ~80% sustained utilization. Below that, on-demand is cheaper.

Can I share reserved capacity?

Only within your account/tenant. Azure has a marketplace to trade unused PTUs mid-contract.

PricingReading · ~3 min · 68 words deep

Reserved Capacity

Reserved capacity sells dedicated inference slots at a flat hourly rate · you buy tokens-per-second, not per-call pricing.

TL;DR

Reserved capacity sells dedicated inference slots at a flat hourly rate · you buy tokens-per-second, not per-call pricing.

Level 1

Basic

AWS Bedrock Provisioned Throughput, Google Vertex AI Provisioned Throughput, and Azure OpenAI Provisioned Throughput Units (PTUs) let enterprises reserve dedicated inference capacity. Pay hourly for a guaranteed TPS (tokens per second). If you're running steady high-volume workloads, reserved is cheaper than pay-per-call · and avoids rate limits and cold starts.

Level 2

Deep

Reserved capacity breakeven depends on utilization. AWS Bedrock reservations start at $21/hour for a Claude Haiku unit (~200 TPS). If you run 24/7 at that throughput, it costs $15K/month flat · vs pay-per-call which might be $10-30K depending on input/output ratio. Azure OpenAI PTUs are similar: reserve 300-500 TPS blocks. Google offers minute-granularity on Vertex. Discounts increase with commitment length · 1-year reservations are 30-50% cheaper than on-demand.

Level 3

Expert

PTU math: compute expected monthly tokens × per-token price. If reserved hourly × 730 hours < that number, reserved wins. Common for agents (token-hungry) and batch operations. Watch out for mixed workloads · reserved capacity is per-model-per-region · cross-region or cross-model calls still pay on-demand. Enterprise finops teams use a hybrid: reserve for baseline, burst on-demand for spikes. Azure PTU marketplace lets customers trade unused capacity mid-contract.

The takeaway for you

Depending on why you're here

If you are a

Researcher

·Reserved = flat hourly for guaranteed TPS
·Bedrock, Vertex, Azure all sell this as "provisioned throughput"
·1-year commit: 30-50% cheaper than on-demand

If you are a

Builder

·Breakeven usually >80% sustained utilization
·Best for predictable high-volume workloads
·Mix reserved (baseline) + on-demand (burst) for cost optimization

If you are a

Investor

·Reserved capacity = enterprise annual contract · revenue stability for providers
·Azure PTU + Bedrock PTU drive $10B+ ARR combined
·Shifts pricing dynamics toward capacity markets

If you are a

Curious · Normie

·Like buying a monthly phone plan instead of paying per call
·Good for businesses with predictable heavy AI usage
·Locks in cost but you commit to a minimum

Gecko's take

Reserved capacity is how enterprises actually scale. The implicit price floor for serious AI workloads.

Frequently Asked Questions

Typically 1 month (AWS Bedrock) or 1 hour (Google Vertex minute-billed). 1-year commits get deeper discounts.

Reserved Capacity

Basic

Deep

Expert

Depending on why you're here

Frequently Asked Questions

Related terms

Glossary

Explore live data

Cite or embed