Beta
PricingReading · ~3 min · 68 words deep

Reserved Capacity

Reserved capacity sells dedicated inference slots at a flat hourly rate · you buy tokens-per-second, not per-call pricing.

TL;DR

Reserved capacity sells dedicated inference slots at a flat hourly rate · you buy tokens-per-second, not per-call pricing.

Level 1

AWS Bedrock Provisioned Throughput, Google Vertex AI Provisioned Throughput, and Azure OpenAI Provisioned Throughput Units (PTUs) let enterprises reserve dedicated inference capacity. Pay hourly for a guaranteed TPS (tokens per second). If you're running steady high-volume workloads, reserved is cheaper than pay-per-call · and avoids rate limits and cold starts.

Level 2

Reserved capacity breakeven depends on utilization. AWS Bedrock reservations start at $21/hour for a Claude Haiku unit (~200 TPS). If you run 24/7 at that throughput, it costs $15K/month flat · vs pay-per-call which might be $10-30K depending on input/output ratio. Azure OpenAI PTUs are similar: reserve 300-500 TPS blocks. Google offers minute-granularity on Vertex. Discounts increase with commitment length · 1-year reservations are 30-50% cheaper than on-demand.

Level 3

PTU math: compute expected monthly tokens × per-token price. If reserved hourly × 730 hours < that number, reserved wins. Common for agents (token-hungry) and batch operations. Watch out for mixed workloads · reserved capacity is per-model-per-region · cross-region or cross-model calls still pay on-demand. Enterprise finops teams use a hybrid: reserve for baseline, burst on-demand for spikes. Azure PTU marketplace lets customers trade unused capacity mid-contract.

The takeaway for you
If you are a
Researcher
  • ·Reserved = flat hourly for guaranteed TPS
  • ·Bedrock, Vertex, Azure all sell this as "provisioned throughput"
  • ·1-year commit: 30-50% cheaper than on-demand
If you are a
Builder
  • ·Breakeven usually >80% sustained utilization
  • ·Best for predictable high-volume workloads
  • ·Mix reserved (baseline) + on-demand (burst) for cost optimization
If you are a
Investor
  • ·Reserved capacity = enterprise annual contract · revenue stability for providers
  • ·Azure PTU + Bedrock PTU drive $10B+ ARR combined
  • ·Shifts pricing dynamics toward capacity markets
If you are a
Curious · Normie
  • ·Like buying a monthly phone plan instead of paying per call
  • ·Good for businesses with predictable heavy AI usage
  • ·Locks in cost but you commit to a minimum
Gecko's take

Reserved capacity is how enterprises actually scale. The implicit price floor for serious AI workloads.

Typically 1 month (AWS Bedrock) or 1 hour (Google Vertex minute-billed). 1-year commits get deeper discounts.