Reserved Capacity
Reserved capacity sells dedicated inference slots at a flat hourly rate · you buy tokens-per-second, not per-call pricing.
Reserved capacity sells dedicated inference slots at a flat hourly rate · you buy tokens-per-second, not per-call pricing.
Basic
AWS Bedrock Provisioned Throughput, Google Vertex AI Provisioned Throughput, and Azure OpenAI Provisioned Throughput Units (PTUs) let enterprises reserve dedicated inference capacity. Pay hourly for a guaranteed TPS (tokens per second). If you're running steady high-volume workloads, reserved is cheaper than pay-per-call · and avoids rate limits and cold starts.
Deep
Reserved capacity breakeven depends on utilization. AWS Bedrock reservations start at $21/hour for a Claude Haiku unit (~200 TPS). If you run 24/7 at that throughput, it costs $15K/month flat · vs pay-per-call which might be $10-30K depending on input/output ratio. Azure OpenAI PTUs are similar: reserve 300-500 TPS blocks. Google offers minute-granularity on Vertex. Discounts increase with commitment length · 1-year reservations are 30-50% cheaper than on-demand.
Expert
PTU math: compute expected monthly tokens × per-token price. If reserved hourly × 730 hours < that number, reserved wins. Common for agents (token-hungry) and batch operations. Watch out for mixed workloads · reserved capacity is per-model-per-region · cross-region or cross-model calls still pay on-demand. Enterprise finops teams use a hybrid: reserve for baseline, burst on-demand for spikes. Azure PTU marketplace lets customers trade unused capacity mid-contract.
Depending on why you're here
- ·Reserved = flat hourly for guaranteed TPS
- ·Bedrock, Vertex, Azure all sell this as "provisioned throughput"
- ·1-year commit: 30-50% cheaper than on-demand
- ·Breakeven usually >80% sustained utilization
- ·Best for predictable high-volume workloads
- ·Mix reserved (baseline) + on-demand (burst) for cost optimization
- ·Reserved capacity = enterprise annual contract · revenue stability for providers
- ·Azure PTU + Bedrock PTU drive $10B+ ARR combined
- ·Shifts pricing dynamics toward capacity markets
- ·Like buying a monthly phone plan instead of paying per call
- ·Good for businesses with predictable heavy AI usage
- ·Locks in cost but you commit to a minimum
Reserved capacity is how enterprises actually scale. The implicit price floor for serious AI workloads.