Beta

Cerebras CS-3

ApplianceShippingWSE-32024

Wafer-scale AI compute appliance. A single CS-3 contains one WSE-3 chip (the largest chip ever made, using an entire 300mm wafer). 44 GB of on-chip SRAM eliminates the HBM bottleneck entirely. No memory wall. Paired with MemoryX and SwarmX for disaggregated memory and networking. Fastest inference per chip in the world.

1
GPUs per system
GPU model
CW
Cerebras WSE-3
GPU count
1x
CPU model
Host server
1x
Memory type
SRAM (on-die)
Total HBM
0 TB
Host memory
1.5 TB
Interconnect
SwarmX
12 TB/s
Networking
1200 Gbps
Storage
MemoryX external
Form factor
15U appliance
Weight
250 kg
Rack units
15U

Manufacturer datasheet values · aggregate system compute

FP4 PFLOPSTBD
FP8 PFLOPSTBD
FP16 PFLOPS125
BF16 PFLOPS125
Training effective PFLOPS90
Inference tokens/sec1,800

Thermal envelope · cooling requirements · efficiency

Rack power
23 kW
Per GPU
23000 W
Cooling
liquid
PUE estimate
1.1
Power draw relative to tracked systems23 kW / 2500 kW max

Hardware amortized over 3 years · power at $0.05/kWh

List price
$3,500,000
Per GPU effective
$3,500,000
Cost per GPU per month
$97,222
Available from
Cerebras logoCerebras

Disclosed in press releases, SEC filings, and conference talks

Quantity
Inference cloud
Source
Cerebras blog
Quantity
Medical AI research
Source
Cerebras press release

Every data point on this page is reproducible

Compare across the system landscape