Trainium 3
Trainium 3 is AWS's third-gen AI training ASIC · announced November 2024 · claims 2× perf/watt over Trainium 2 and powers Anthropic's AWS training clusters.
Trainium 3 is AWS's third-gen AI training ASIC · announced November 2024 · claims 2× perf/watt over Trainium 2 and powers Anthropic's AWS training clusters.
Basic
Announced at re:Invent 2024, Trainium 3 is AWS's custom training chip. Key specs: 5nm process (TSMC), ~860 FP8 TFLOPS per chip, HBM3e memory. Deployed in Ultraclusters of up to 100K chips via EFA networking. Anthropic signed multi-year training commitments on Trainium 3 as part of Amazon's $8B investment.
Deep
Trainium 3 targets training cost reduction for frontier models. AWS claims 40% better price-performance than comparable H100 clusters at equivalent throughput. The chip uses a NeuronCore architecture (custom AWS design, not CUDA-compatible). Software stack is AWS Neuron SDK + PyTorch XLA. This is AWS's play to reduce NVIDIA dependency for hyperscale training workloads · Anthropic is the flagship customer.
Expert
Trainium 3 cluster scale: up to 100K chips in Project Rainier (joint with Anthropic), interconnected via EFA v3 at 3.2 TB/s per chip. Neuron compiler supports PyTorch and JAX via XLA; native CUDA code does NOT run. Perf/watt is ~2× T2 but absolute TFLOPS trail H200 · competitive on price per training-hour, not raw perf. General availability expected Q2 2026; Anthropic reserved bulk capacity.
Depending on why you're here
- ·3rd-gen AWS training ASIC · 5nm TSMC
- ·~860 FP8 TFLOPS · HBM3e
- ·NeuronCore architecture · not CUDA-compatible
- ·Access via AWS only · no retail
- ·Requires AWS Neuron SDK + PyTorch XLA
- ·Best for training workloads already on AWS
- ·AWS's counter-bet to NVIDIA dominance · key for AWS margin protection
- ·Anthropic commitment anchors Trainium revenue
- ·Gross margin on Trainium inference is much higher than reselling NVIDIA
- ·Amazon's own AI training chip · alternative to NVIDIA
- ·Used to train Claude models on AWS
- ·Part of Amazon's $8B Anthropic investment
Trainium 3 is AWS's bet that Anthropic's training needs can anchor a real NVIDIA alternative. The Project Rainier 100K cluster proves the concept · execution is the next question.