Do chat APIs use per-request pricing?

Rarely · most LLM chat APIs use per-token. Image gen, search, and tool APIs use per-request.

Enterprise abstraction · credits map to units of work (typically one request or one small batch). Hides underlying token costs.

PricingReading · ~3 min · 59 words deep

Per-Request Pricing

A flat fee per API request regardless of input or output size · common for image generation, web search, and some premium agent products.

TL;DR

A flat fee per API request regardless of input or output size · common for image generation, web search, and some premium agent products.

Level 1

Basic

Per-request pricing is an alternative to per-token pricing. OpenAI DALL-E 3 charges $0.04 per image. Anthropic's web search tool charges $0.01 per query. Perplexity's API charges $0.005 per query. The logic: some features have relatively predictable cost, so providers simplify billing as a flat fee rather than metering tokens.

Level 2

Deep

Per-request pricing tradeoff: predictability vs fairness. For fixed-output features (image generation, fixed-size embeddings, tool calls), flat fee is simple. For variable-length outputs, per-token is fairer. Some providers blend · flat fee plus per-token for output length. Use cases best served by per-request: web search, image gen, function call billing, tool invocations. Enterprise customers often prefer per-request for predictable budgeting.

Level 3

Expert

Per-request pricing is common in enterprise pricing contracts · a "credit" or "unit" abstraction hides token metering. Salesforce Einstein sells "Generative AI credits" that map to requests; Microsoft Copilot sells "Message Units." This abstraction decouples customer pricing from underlying model costs, giving providers margin flexibility when they swap models behind the scenes. Downside for builders: per-request pricing obscures comparisons across providers.

The takeaway for you

Depending on why you're here

If you are a

Researcher

·Flat fee per API call · decoupled from token count
·Common for image gen, web search, tool invocations
·Enables unit-based enterprise pricing abstractions

If you are a

Builder

·Use when output length varies wildly · caps unexpected bills
·Watch for providers hiding per-token costs behind "credits"
·Compare total call cost vs tokenized alternatives

If you are a

Investor

·Unit pricing is the enterprise abstraction layer · decouples margin from cost
·Allows providers to swap models without changing customer bills
·Standard in Salesforce, Microsoft, SAP AI suites

If you are a

Curious · Normie

·A fixed price per AI query · no surprises from long conversations
·Common for AI image makers and AI search
·Simpler than paying per word

Gecko's take

Per-request pricing is how enterprise AI gets sold. Simpler billing beats fair pricing on procurement committees.

Frequently Asked Questions

When output is long and unpredictable. For short outputs, per-token is usually cheaper.

Per-Request Pricing

Basic

Deep

Expert

Depending on why you're here

Frequently Asked Questions

Related terms

Glossary

Explore live data

Cite or embed