Every System · Tracked
Every rack-scale AI system from every manufacturer. GPU counts, FP8 PFLOPS, power draw, cooling type, TCO per PFLOPS, and known datacenter deployments · sourced from spec sheets, earnings calls, and disclosed infrastructure builds.
- $973
- $1,469
- $1,768
- $2,063
- $3,306
- $14,734
- DGX GB300 NVL72 offers the lowest TCO at $973/PFLOPS/year
- Liquid-cooled systems average 6% lower TCO than air-cooled
- 4 of 8 ranked systems are NVIDIA-based
Systems table
10 systems · sorted by FP8 PFLOPS · all manufacturers
| System | GPUs | FP8 PFLOPS |
|---|---|---|
TPU v5p Pod Google | 8,960 | 8,100 |
DGX GB300 NVL72 NVIDIA | 72 | 1,440 |
DGX GB200 NVL72 NVIDIA | 72 | 720 |
TPU v6e Pod Google | 256 | 230 |
Maia 100 Rack Microsoft | 32 | 96 |
DGX B200 NVIDIA | 8 | 80 |
MI325X Platform AMD | 8 | 48 |
Trn2 UltraServer AWS | 16 | 48 |
HGX H100 NVIDIA | 8 | 32 |
CS-3 Cerebras | 1 | TBD |
TCO comparison
$/PFLOPS/year · hardware amortized 3yr + power at $0.05/kWh · lower is better
Cooling breakdown
Liquid vs air · power stats · efficiency comparison
Liquid cooling enables PUE of 1.05 to 1.15 vs 1.3 to 1.5 for air cooling. At datacenter scale, this translates to 15 to 30% lower power costs. Liquid-cooled systems also allow higher GPU density per rack, reducing interconnect latency and physical footprint. Every major new AI rack (DGX GB200, MI350X cluster) ships liquid-cooled by default.
Known deployments
19 disclosed deployments · who is building what
| Operator | System |
|---|---|
OC Oracle Cloud | |
All systems
10 systems · 6 manufacturers
Google's high-performance TPU pod for large-scale training. 8,960 TPU v5p chips in a single pod connected via ICI 3.0 fabric. Powers Gemini ...
Next-gen liquid-cooled rack with Blackwell Ultra GPUs. 72x B300 GPUs with 288 GB HBM3e per GPU (vs 192 GB on B200). Designed for reasoning-h...
NVIDIA's flagship liquid-cooled rack. 72 Blackwell GPUs + 36 Grace CPUs connected via NVLink 5.0 in a single 72-GPU domain. Designed for tri...
Google's latest custom AI accelerator in pod configuration. 256 TPU v6e chips connected via custom ICI (Inter-Chip Interconnect). Optimized ...
Microsoft's first custom AI silicon at rack scale. Maia 100 chips fabricated at TSMC on N5 with HBM3e. Designed for Azure AI inference and f...
8-GPU Blackwell node for enterprises that don't need the full NVL72 rack. Air-cooled with NVLink 4.0 interconnect. The workhorse for inferen...
AMD's latest 8-GPU OAM platform with MI325X accelerators. 256 GB HBM3e per GPU for the largest memory footprint in its class. Infinity Fabri...
AWS custom silicon training server. 16x Trainium2 chips in a single UltraServer node connected via NeuronLink. Designed to compete with NVID...
The system that launched the AI infrastructure boom. 8x H100 SXM GPUs connected via NVLink 4.0. Still the most widely deployed AI training s...
Wafer-scale AI compute appliance. A single CS-3 contains one WSE-3 chip (the largest chip ever made, using an entire 300mm wafer). 44 GB of ...
Frequently asked
Pulled from the live dataset · schema-ready for AEO
What is a DGX GB200 NVL72?
The DGX GB200 NVL72 is NVIDIA's flagship AI compute rack. It contains 72 Blackwell GPUs and 36 Grace CPUs in a single liquid-cooled rack, connected via NVLink 5.0. It delivers up to 720 PFLOPS of FP8 compute and requires 120+ kW of power. It's the system training the largest AI models at Microsoft, Oracle, CoreWeave, and xAI.
What does TCO per PFLOPS mean?
Total Cost of Ownership per PFLOPS per year normalizes the cost of different AI systems to a comparable metric. It includes hardware cost (amortized over 3 years) plus power costs (at $0.05/kWh average datacenter rate) divided by the system's FP8 compute output in PFLOPS. Lower is better. This is the metric datacenter operators optimize when choosing between NVIDIA, AMD, Google TPU, or custom silicon.
Why do some systems require liquid cooling?
Modern AI chips consume 700-1200W each. An NVL72 rack with 72 GPUs at 1000W each generates 72 kW of heat from GPUs alone. Air cooling cannot efficiently remove this much heat. Liquid cooling (direct-to-chip or immersion) is 10-100x more effective at heat transfer, enabling higher GPU density per rack, lower PUE (1.1 vs 1.3-1.5 for air), and higher chip performance due to better thermal management.
How does a TPU pod compare to an NVIDIA DGX rack?
Google TPU pods and NVIDIA DGX racks are fundamentally different architectures. A TPU v5p pod can contain 8,960 chips in a single interconnected domain via ICI fabric. NVIDIA's NVL72 is a 72-GPU rack-scale system. TPU pods offer massive parallelism for Google's own model architectures but are only available on Google Cloud. DGX systems are available from multiple OEMs for on-premise deployment.
What is the Cerebras CS-3 and why is it different?
The Cerebras CS-3 uses a single WSE-3 chip, the largest chip ever made, occupying an entire 300mm silicon wafer. Instead of using HBM stacks, it has 44 GB of on-chip SRAM for zero-latency memory access. This eliminates the memory bandwidth bottleneck that limits conventional GPU systems. The tradeoff is higher per-system cost and a different programming model.
Why are hyperscalers building custom silicon?
Google (TPU), AWS (Trainium), Microsoft (Maia), and Meta (MTIA) are all building custom AI chips to reduce dependency on NVIDIA, optimize for their specific workloads, and lower costs at scale. At hyperscaler volume (hundreds of thousands of chips), even a 10-20% efficiency gain over NVIDIA translates to billions in savings. Custom silicon also provides supply chain diversification.
See also
Keep exploring the compute graph