Beta
Category16 terms · BenchGecko glossary

Pricing

Input, output, cache, arbitrage, free tiers.

Learn hub
Most-read in Pricing
Pricing
Cache Hit Rate

The fraction of input tokens served from provider-side prompt cache · directly impacts effective pricing.

Read
Pricing
Batched Inference

Submitting many prompts at once in a single batch job · providers discount 50% on delayed batch completion.

Read
Pricing
Reserved Capacity

Pre-purchased dedicated throughput · flat hourly fee for guaranteed tokens-per-second from a specific model.

Read
Pricing
Per-Request Pricing

A flat fee per API call regardless of tokens · used for image generation, search, and some agent products.

Read
Pricing
Tiered Pricing

Volume discounts at defined usage thresholds · $X/M tokens below 1B tokens, $Y/M above.

Read
Pricing
Volume Discounts

Pre-negotiated flat rate applied to all consumption · typical enterprise AI contract shape.

Read
Pricing
Multi-Modal Pricing

Per-token pricing differs by input type · text tokens vs image tokens vs audio seconds vs video frames.

Read
Pricing
Reasoning Token Billing

Reasoning models bill thinking tokens separately from output · can 2-10× effective cost per query.

Read
Pricing
Function Call Billing

Tool-use / function-call responses count as output tokens · structured JSON tool calls are billed normally.

Read
Pricing
Spot Pricing

Preemptible GPU capacity at deep discount · AWS Spot, GCP Spot, Azure Spot · 60-90% off on-demand but can be evicted.

Read
Pricing
BYOK(BYOK)

Bring Your Own Key · you pay the model provider directly, the app just routes your requests.

Read
Pricing
Input Tokens

Tokens you send to the model · priced separately from output tokens, usually cheaper.

Read
Everything in this category
Explore more
The Pricing category covers 16 terms. Input, output, cache, arbitrage, free tiers. Every term has four depth levels (TL;DR, Basic, Deep, Expert), role-based takeaways, FAQs, and live BenchGecko data where available.