Cerebras CS-3
ApplianceShippingWSE-32024
Wafer-scale AI compute appliance. A single CS-3 contains one WSE-3 chip (the largest chip ever made, using an entire 300mm wafer). 44 GB of on-chip SRAM eliminates the HBM bottleneck entirely. No memory wall. Paired with MemoryX and SwarmX for disaggregated memory and networking. Fastest inference per chip in the world.
1
GPUs per system
Total HBM
0 TB
Host memory
1.5 TB
Interconnect
SwarmX
12 TB/s
Networking
1200 Gbps
Storage
MemoryX external
Form factor
15U appliance
Weight
250 kg
Rack units
15U
Performance
Manufacturer datasheet values · aggregate system compute
| FP4 PFLOPS | TBD |
| FP8 PFLOPS | TBD |
| FP16 PFLOPS | 125 |
| BF16 PFLOPS | 125 |
| Training effective PFLOPS | 90 |
| Inference tokens/sec | 1,800 |
Power and cooling
Thermal envelope · cooling requirements · efficiency
Rack power
23 kW
Per GPU
23000 W
Cooling
liquid
PUE estimate
1.1
Power draw relative to tracked systems23 kW / 2500 kW max
TCO analysis
Hardware amortized over 3 years · power at $0.05/kWh
List price
$3,500,000
Per GPU effective
$3,500,000
Cost per GPU per month
$97,222
Available from
Known deployments
Disclosed in press releases, SEC filings, and conference talks
Quantity
Inference cloud
Source
Cerebras blogQuantity
Medical AI research
Source
Cerebras press releaseSources
Every data point on this page is reproducible
Other AI systems
Compare across the system landscape
8960x Google TPU v5p · 8100 PFLOPS
Pod / clusterShipping
72x NVIDIA B300 · 1440 PFLOPS
Full rackAnnounced
72x NVIDIA B200 · 720 PFLOPS
Full rackShipping
256x Google TPU v6e · 230 PFLOPS
Pod / clusterShipping
32x Microsoft Maia 100 · 96 PFLOPS
Full rackRamping
8x NVIDIA B200 · 80 PFLOPS
Server nodeShipping