What's the max batch size?

OpenAI: 50K requests or 100MB per batch. Anthropic: 100K or 256MB. Submit multiple batches for more.

Can I cancel a batch?

Yes · while pending. Cancelled batches are not charged.

PricingReading · ~3 min · 67 words deep

Batched Inference

Batch APIs let you submit thousands of prompts for offline processing at 50% of synchronous pricing · turnaround up to 24 hours.

TL;DR

Batch APIs let you submit thousands of prompts for offline processing at 50% of synchronous pricing · turnaround up to 24 hours.

Level 1

Basic

OpenAI Batch API, Anthropic Message Batches, and Gemini Batch Prediction all offer a tradeoff: submit a JSONL file of prompts, get results back within 24 hours, pay 50% of real-time pricing. Use cases: bulk content generation, offline data labeling, evaluation runs, embedding generation. Not for interactive apps.

Level 2

Deep

Batch throughput bypasses rate limits · you can submit 50K prompts even if your sync rate limit is 1000 RPM. The 50% discount applies to both input and output tokens. Providers run batches on leftover capacity, which is why the SLA is "up to 24 hours." Typical real-world turnaround: 1-4 hours. Batch pricing combined with prompt caching gives an effective 10-20× cost reduction vs naive synchronous calls.

Level 3

Expert

Batch is idempotent · provider dedupes by custom_id. Failed items are returned with error codes; no partial charges. Anthropic batch supports cache_control blocks (cache applies across batch). Limits: OpenAI caps at 50K requests or 100MB per batch; Anthropic 100K or 256MB. Cross-provider pattern: use batch for reranking, synthetic data generation, large-scale evaluation. Finops teams now mandate batch for any workflow that can tolerate 1-24h latency.

The takeaway for you

Depending on why you're here

If you are a

Researcher

·50% discount on input + output tokens
·SLA 24 hours · typical 1-4 hours
·Bypasses synchronous rate limits

If you are a

Builder

·Use for eval runs, embeddings, bulk generation
·JSONL submission · poll until complete
·Combine with prompt caching for stacked savings

If you are a

Investor

·Batch utilizes spare provider capacity · win-win pricing
·Enterprise workloads increasingly shift to batch
·Catalyst for very large-scale AI training data generation

If you are a

Curious · Normie

·A way to ask AI many questions at once for half price · but wait hours for answers
·Good for big non-urgent jobs
·Not for chat apps

Gecko's take

Batch API is the first cost-cutting lever for any non-chat workflow. 50% off is enormous · use it.

Frequently Asked Questions

OpenAI (Batch API), Anthropic (Message Batches), Google (Batch Prediction on Gemini). All at 50% discount.

Batched Inference

Basic

Deep

Expert

Depending on why you're here

Frequently Asked Questions

Related terms

Glossary

Explore live data

Cite or embed