Beta
PricingReading · ~3 min · 46 words deep

Arbitrage (AI Pricing)

Same weights, different prices · open-weight models like Llama 4 run on dozens of providers at wildly different price points.

TL;DR

Same weights, different prices · open-weight models like Llama 4 run on dozens of providers at wildly different price points.

Level 1

Llama 4 70B is hosted by Fireworks, Together, DeepInfra, OpenRouter, Groq, Bedrock, Vertex, and Azure. Each charges a different price per million tokens. Arbitrage means routing your workload to the cheapest provider that meets your latency and reliability needs. Savings on open-weight models can exceed 30× from most expensive to cheapest.

Level 2

Arbitrage dimensions: token price, latency, throughput, reliability, compliance. Cheapest provider often has worst latency; fastest often charges premium. OpenRouter aggregates 200+ providers and routes automatically. Self-hosting hits zero marginal cost but adds engineering overhead. BenchGecko's /pricing/arbitrage/[model-slug] tracks every hosted instance with price + latency per provider.

Level 3

The arbitrage gap: Llama 4 70B ranges from $0.20/M (Groq, fastest) to $6.00/M (some boutique providers) depending on provider chosen. Production arbitrage routing accounts for latency SLA, regional availability, rate limits, and compliance. Tools like OpenRouter, Portkey, and Litellm handle routing. The arbitrage opportunity decreases as model ages (commoditization) and collapses when the model gets deprecated.

The takeaway for you
If you are a
Researcher
  • ·Open-weight arbitrage gap up to 30× · proprietary APIs have no arbitrage
  • ·Latency-price trade-off is real · fastest costs more
  • ·Tools: OpenRouter, Portkey, Litellm for routing
If you are a
Builder
  • ·For open-weight models: check /pricing/arbitrage/[model] before picking a provider
  • ·Route by latency SLA + price + region
  • ·Self-host if volume justifies the engineering overhead
If you are a
Investor
  • ·Arbitrage compresses open-weight provider margins
  • ·Aggregators (OpenRouter) capture value by solving routing
  • ·Closed-model pricing power depends on no-arbitrage · opens a gap
If you are a
Curious · Normie
  • ·Same AI, different prices on different websites
  • ·Like shopping for the cheapest airline
  • ·Only works for open-source AI models
Gecko's take

Arbitrage is why the open-weight ecosystem matters. Closed APIs have zero arbitrage · you pay what they charge.

The price of knowing this term

On 100M tokens/month of Llama 4 usage: Fireworks at $1.00/M = $100. Groq at $0.20/M = $20. Savings: $80/month. At 1B tokens: $800/month.

No · GPT-5 is proprietary. Only OpenAI and Azure OpenAI serve it, at essentially identical pricing.