Arbitrage · Llama 46 providers · $0.030 → $0.220 · 86% spread
Cheapest Provider for Llama 4
Meta's flagship open-weight model · runs on six major inference providers with 10x price spread.
Cheapest input
Groq
$0.030/M
Fastest inference · LPU hardware
Fastest
Groq
750 tok/s
Fastest inference · LPU hardware
Savings calculator
Save 86%
vs Replicate at $0.220/M input. For 100M tokens/mo, that is $19/mo saved by routing to Groq.
Sorted by input price
All 6 providers
| Provider | In $/M | Out $/M |
|---|---|---|
| $0.030 | $0.050 | |
| $0.050 | $0.100 | |
| $0.090 | $0.090 | |
| $0.100 | $0.150 | |
D DeepInfra | $0.180 | $0.180 |
| $0.220 | $0.300 |
Notes: Groq · Fastest inference · LPU hardware ; Fireworks AI · Serverless · batch 50% off ; Together AI · Flat pricing · $5 free credit ; OpenRouter · Meta-router · fall-through pricing ; DeepInfra · OpenAI-compatible · no free tier ; Replicate · Pay per second · cold starts
Frequently Asked Questions
Groq at $0.030/M input and $0.050/M output. That is 86% cheaper than Replicate. Fastest inference · LPU hardware.