Compute · SystemsRack-scale AI infrastructure

Every System · Tracked

Every rack-scale AI system from every manufacturer. GPU counts, FP8 PFLOPS, power draw, cooling type, TCO per PFLOPS, and known datacenter deployments · sourced from spec sheets, earnings calls, and disclosed infrastructure builds.

TCO trackedCooling profiledDeployments mappedPFLOPS per kW

TCO Index

Best TCO per PFLOPS per year

$973

DGX GB300 NVL72

Lower is better. Hardware amortized over 3 years + power at $0.05/kWh. Average across all systems: $9,046/PFLOPS/yr.

TCO ranking · $/PFLOPS/year

1.DGX GB300 NVL72
$973
2.DGX GB200 NVL72
$1,469
3.DGX B200
$1,768
4.MI325X Platform
$2,063
5.HGX H100
$3,306
6.TPU v6e Pod
$14,734

Key insights

DGX GB300 NVL72 offers the lowest TCO at $973/PFLOPS/year
Liquid-cooled systems average 6% lower TCO than air-cooled
4 of 8 ranked systems are NVIDIA-based

Full Compute Hub

Systems

9,433

Total GPUs

Manufacturers

914 TB

Total HBM

8100

Max PFLOPS

Liquid cooled

Shipping

4.81

Avg PFLOPS/kW

Systems table

10 systems · sorted by FP8 PFLOPS · all manufacturers

System	Category	GPUs	FP8 PFLOPS	Power (kW)	Cooling	Price	TCO/PFLOPS/yr
TPU v5p Pod Google	Pod / cluster	8,960	8,100	2,500	Liquid	$12000K/mo	$17,926
DGX GB300 NVL72 NVIDIA	Full rack	72	1,440	140	Liquid	$4.0M	$973
DGX GB200 NVL72 NVIDIA	Full rack	72	720	120	Liquid	$3.0M	$1,469
TPU v6e Pod Google	Pod / cluster	256	230	60	Liquid	$280K/mo	$14,734
Maia 100 Rack Microsoft	Full rack	32	96	40	Liquid	TBD	TBD
DGX B200 NVIDIA	Server node	8	80	14.3	Air	$0.4M	$1,768
MI325X Platform AMD	Server node	8	48	10	Air	$0.3M	$2,063
Trn2 UltraServer AWS	Server node	16	48	12	Air	$120K/mo	$30,131
HGX H100 NVIDIA	Server node	8	32	10.2	Air	$0.3M	$3,306
CS-3 Cerebras	Appliance	1	TBD	23	Liquid	$3.5M	TBD

TCO comparison

$/PFLOPS/year · hardware amortized 3yr + power at $0.05/kWh · lower is better

DGX GB300 NVL72Liquid

72 GPUs$973

DGX GB200 NVL72Liquid

72 GPUs$1,469

8 GPUs$1,768

8 GPUs$2,063

8 GPUs$3,306

256 GPUs$14,734

8960 GPUs$17,926

16 GPUs$30,131

Average TCO across all ranked systems

$9,046/PFLOPS/yr

Cooling breakdown

Liquid vs air · power stats · efficiency comparison

Liquid cooled6 systems

Avg power

481 kW

Avg PFLOPS/kW

5.15

TPU v5p Pod2500 kW

DGX GB300 NVL72140 kW

DGX GB200 NVL72120 kW

TPU v6e Pod60 kW

Maia 100 Rack40 kW

CS-323 kW

Air cooled4 systems

Avg power

12 kW

Avg PFLOPS/kW

4.38

DGX B20014.3 kW

MI325X Platform10 kW

Trn2 UltraServer12 kW

HGX H10010.2 kW

Why cooling matters for TCO

Liquid cooling enables PUE of 1.05 to 1.15 vs 1.3 to 1.5 for air cooling. At datacenter scale, this translates to 15 to 30% lower power costs. Liquid-cooled systems also allow higher GPU density per rack, reducing interconnect latency and physical footprint. Every major new AI rack (DGX GB200, MI350X cluster) ships liquid-cooled by default.

Known deployments

19 disclosed deployments · who is building what

Operator	System	Quantity	Source
Google DeepMind	TPU v5p Pod	Gemini training	Google research blog
Anthropic	TPU v5p Pod	Claude training (GCP)	Anthropic partnership announcement
Microsoft Azure	DGX GB200 NVL72	tens of thousands of GPUs	Microsoft Build 2025
OC Oracle Cloud	DGX GB200 NVL72	65,000 B200 GPUs	Oracle Q4 2025 earnings
CoreWeave	DGX GB200 NVL72	large-scale deployment	CoreWeave IPO S-1
xAI	DGX GB200 NVL72	Colossus cluster expansion	xAI Memphis datacenter
Google DeepMind	TPU v6e Pod	Gemini 2.5 training	Google I/O 2025
Google Cloud	TPU v6e Pod	Public cloud availability	Google Cloud blog
Microsoft Azure	Maia 100 Rack	Internal deployment	Microsoft Ignite 2024
Enterprise customers	DGX B200	broad availability	NVIDIA Q1 2026 earnings
Microsoft Azure	MI325X Platform	ND MI300X v5 instances	Azure blog
Meta	MI325X Platform	inference fleet diversification	Meta earnings call
Amazon AGI	Trn2 UltraServer	Nova model training	AWS re:Invent 2025
Anthropic	Trn2 UltraServer	Claude training (AWS)	Anthropic AWS partnership
Meta	HGX H100	600,000+ H100 GPUs	Meta Q4 2025 earnings
Microsoft Azure	HGX H100	hundreds of thousands	Microsoft FY2025 10-K
Google Cloud	HGX H100	A3 instances	Google Cloud blog
Cerebras Inference	CS-3	Inference cloud	Cerebras blog
Mayo Clinic	CS-3	Medical AI research	Cerebras press release

All systems

10 systems · 6 manufacturers

TPU v5p PodShipping

Google's high-performance TPU pod for large-scale training. 8,960 TPU v5p chips in a single pod connected via ICI 3.0 fabric. Powers Gemini ...

Pod / clusterLiquidGoogle TPU v5p

DGX GB300 NVL72Announced

Next-gen liquid-cooled rack with Blackwell Ultra GPUs. 72x B300 GPUs with 288 GB HBM3e per GPU (vs 192 GB on B200). Designed for reasoning-h...

Full rackLiquidNVIDIA B300

HBM: 20.7 TB·NVLink 5.0+

$4.0M list · $973/PFLOPS/yr

DGX GB200 NVL72Shipping

NVIDIA's flagship liquid-cooled rack. 72 Blackwell GPUs + 36 Grace CPUs connected via NVLink 5.0 in a single 72-GPU domain. Designed for tri...

Full rackLiquidNVIDIA B200

HBM: 13.8 TB·NVLink 5.0

$3.0M list · $1,469/PFLOPS/yr

Google's latest custom AI accelerator in pod configuration. 256 TPU v6e chips connected via custom ICI (Inter-Chip Interconnect). Optimized ...

Pod / clusterLiquidGoogle TPU v6e

Microsoft's first custom AI silicon at rack scale. Maia 100 chips fabricated at TSMC on N5 with HBM3e. Designed for Azure AI inference and f...

GPUs

FP8 PFLOPS

Power

Full rackLiquidMicrosoft Maia 100

HBM: 6.1 TB·Custom Ethernet fabric

Deployed

Microsoft Azure

DGX B200Shipping

8-GPU Blackwell node for enterprises that don't need the full NVL72 rack. Air-cooled with NVLink 4.0 interconnect. The workhorse for inferen...

Server nodeAirNVIDIA B200

HBM: 1.5 TB·NVLink 4.0

$0.4M list · $1,768/PFLOPS/yr

Deployed

Enterprise customers

MI325X PlatformShipping

AMD's latest 8-GPU OAM platform with MI325X accelerators. 256 GB HBM3e per GPU for the largest memory footprint in its class. Infinity Fabri...

GPUs

FP8 PFLOPS

Power

Server nodeAirAMD MI325X

HBM: 2 TB·Infinity Fabric

$0.3M list · $2,063/PFLOPS/yr

Deployed

Microsoft Azure

Frequently asked

Pulled from the live dataset · schema-ready for AEO

What is a DGX GB200 NVL72?

The DGX GB200 NVL72 is NVIDIA's flagship AI compute rack. It contains 72 Blackwell GPUs and 36 Grace CPUs in a single liquid-cooled rack, connected via NVLink 5.0. It delivers up to 720 PFLOPS of FP8 compute and requires 120+ kW of power. It's the system training the largest AI models at Microsoft, Oracle, CoreWeave, and xAI.

What does TCO per PFLOPS mean?

Total Cost of Ownership per PFLOPS per year normalizes the cost of different AI systems to a comparable metric. It includes hardware cost (amortized over 3 years) plus power costs (at $0.05/kWh average datacenter rate) divided by the system's FP8 compute output in PFLOPS. Lower is better. This is the metric datacenter operators optimize when choosing between NVIDIA, AMD, Google TPU, or custom silicon.

Why do some systems require liquid cooling?

Modern AI chips consume 700-1200W each. An NVL72 rack with 72 GPUs at 1000W each generates 72 kW of heat from GPUs alone. Air cooling cannot efficiently remove this much heat. Liquid cooling (direct-to-chip or immersion) is 10-100x more effective at heat transfer, enabling higher GPU density per rack, lower PUE (1.1 vs 1.3-1.5 for air), and higher chip performance due to better thermal management.

How does a TPU pod compare to an NVIDIA DGX rack?

Google TPU pods and NVIDIA DGX racks are fundamentally different architectures. A TPU v5p pod can contain 8,960 chips in a single interconnected domain via ICI fabric. NVIDIA's NVL72 is a 72-GPU rack-scale system. TPU pods offer massive parallelism for Google's own model architectures but are only available on Google Cloud. DGX systems are available from multiple OEMs for on-premise deployment.

What is the Cerebras CS-3 and why is it different?

The Cerebras CS-3 uses a single WSE-3 chip, the largest chip ever made, occupying an entire 300mm silicon wafer. Instead of using HBM stacks, it has 44 GB of on-chip SRAM for zero-latency memory access. This eliminates the memory bandwidth bottleneck that limits conventional GPU systems. The tradeoff is higher per-system cost and a different programming model.

Why are hyperscalers building custom silicon?

Google (TPU), AWS (Trainium), Microsoft (Maia), and Meta (MTIA) are all building custom AI chips to reduce dependency on NVIDIA, optimize for their specific workloads, and lower costs at scale. At hyperscaler volume (hundreds of thousands of chips), even a 10-20% efficiency gain over NVIDIA translates to billions in savings. Custom silicon also provides supply chain diversification.

Every System · Tracked

Systems table

TCO comparison

Cooling breakdown

Known deployments

All systems

Frequently asked

See also