Beta
Arbitrage · Llama 46 providers · $0.030$0.220 · 86% spread

Cheapest Provider for Llama 4

Meta's flagship open-weight model · runs on six major inference providers with 10x price spread.

405BLlama Community LicenseModel detail page →
Cheapest input
Groq logo
Groq
Fastest inference · LPU hardware
Fastest
Groq logo
Groq
Fastest inference · LPU hardware
Savings calculator
vs Replicate at $0.220/M input. For 100M tokens/mo, that is $19/mo saved by routing to Groq.
Sorted by input price
ProviderIn $/MOut $/M
Groq logoGroqWinner$0.030$0.050
Fireworks AI logoFireworks AI$0.050$0.100
Together AI logoTogether AI$0.090$0.090
OpenRouter logoOpenRouter$0.100$0.150
D
DeepInfra
$0.180$0.180
Replicate logoReplicate$0.220$0.300

Notes: Groq · Fastest inference · LPU hardware ; Fireworks AI · Serverless · batch 50% off ; Together AI · Flat pricing · $5 free credit ; OpenRouter · Meta-router · fall-through pricing ; DeepInfra · OpenAI-compatible · no free tier ; Replicate · Pay per second · cold starts

Groq at $0.030/M input and $0.050/M output. That is 86% cheaper than Replicate. Fastest inference · LPU hardware.