Which open-weight models have the biggest arbitrage gap?

Llama 4 70B (15+ providers), Qwen 3 (10+), DeepSeek V3 (10+). Mixtral 8x22B has the widest spread historically.

Is self-hosting cheaper than the cheapest arbitrage?

At high volume yes. Breakeven typically 100M+ tokens/month depending on model size.

PricingReading · ~3 min · 46 words deep

Arbitrage (AI Pricing)

Same weights, different prices · open-weight models like Llama 4 run on dozens of providers at wildly different price points.

TL;DR

Same weights, different prices · open-weight models like Llama 4 run on dozens of providers at wildly different price points.

Level 1

Basic

Llama 4 70B is hosted by Fireworks, Together, DeepInfra, OpenRouter, Groq, Bedrock, Vertex, and Azure. Each charges a different price per million tokens. Arbitrage means routing your workload to the cheapest provider that meets your latency and reliability needs. Savings on open-weight models can exceed 30× from most expensive to cheapest.

Level 2

Deep

Arbitrage dimensions: token price, latency, throughput, reliability, compliance. Cheapest provider often has worst latency; fastest often charges premium. OpenRouter aggregates 200+ providers and routes automatically. Self-hosting hits zero marginal cost but adds engineering overhead. BenchGecko's /pricing/arbitrage/[model-slug] tracks every hosted instance with price + latency per provider.

Level 3

Expert

The arbitrage gap: Llama 4 70B ranges from $0.20/M (Groq, fastest) to $6.00/M (some boutique providers) depending on provider chosen. Production arbitrage routing accounts for latency SLA, regional availability, rate limits, and compliance. Tools like OpenRouter, Portkey, and Litellm handle routing. The arbitrage opportunity decreases as model ages (commoditization) and collapses when the model gets deprecated.

The takeaway for you

Depending on why you're here

If you are a

Researcher

·Open-weight arbitrage gap up to 30× · proprietary APIs have no arbitrage
·Latency-price trade-off is real · fastest costs more
·Tools: OpenRouter, Portkey, Litellm for routing

If you are a

Builder

·For open-weight models: check /pricing/arbitrage/[model] before picking a provider
·Route by latency SLA + price + region
·Self-host if volume justifies the engineering overhead

If you are a

Investor

·Arbitrage compresses open-weight provider margins
·Aggregators (OpenRouter) capture value by solving routing
·Closed-model pricing power depends on no-arbitrage · opens a gap

If you are a

Curious · Normie

·Same AI, different prices on different websites
·Like shopping for the cheapest airline
·Only works for open-source AI models

Gecko's take

Arbitrage is why the open-weight ecosystem matters. Closed APIs have zero arbitrage · you pay what they charge.

The price of knowing this term

On 100M tokens/month of Llama 4 usage: Fireworks at $1.00/M = $100. Groq at $0.20/M = $20. Savings: $80/month. At 1B tokens: $800/month.

Frequently Asked Questions

No · GPT-5 is proprietary. Only OpenAI and Azure OpenAI serve it, at essentially identical pricing.

Arbitrage (AI Pricing)

Basic

Deep

Expert

Depending on why you're here

Frequently Asked Questions

Related terms

Glossary

Explore live data

Cite or embed