Pricing
Input, output, cache, arbitrage, free tiers.
Top 12 terms
The fraction of input tokens served from provider-side prompt cache · directly impacts effective pricing.
Submitting many prompts at once in a single batch job · providers discount 50% on delayed batch completion.
Pre-purchased dedicated throughput · flat hourly fee for guaranteed tokens-per-second from a specific model.
A flat fee per API call regardless of tokens · used for image generation, search, and some agent products.
Volume discounts at defined usage thresholds · $X/M tokens below 1B tokens, $Y/M above.
Pre-negotiated flat rate applied to all consumption · typical enterprise AI contract shape.
Per-token pricing differs by input type · text tokens vs image tokens vs audio seconds vs video frames.
Reasoning models bill thinking tokens separately from output · can 2-10× effective cost per query.
Tool-use / function-call responses count as output tokens · structured JSON tool calls are billed normally.
Preemptible GPU capacity at deep discount · AWS Spot, GCP Spot, Azure Spot · 60-90% off on-demand but can be evicted.
Bring Your Own Key · you pay the model provider directly, the app just routes your requests.
Tokens you send to the model · priced separately from output tokens, usually cheaper.