Beta
ChipsReading · ~3 min · 56 words deep

Inferentia 3

Inferentia 3 is AWS's third-gen inference ASIC · launched late 2024 to serve LLMs like Claude + Llama at scale on Bedrock.

Inferentia 3 on hardware map
TL;DR

Inferentia 3 is AWS's third-gen inference ASIC · launched late 2024 to serve LLMs like Claude + Llama at scale on Bedrock.

Level 1

Inferentia 3 targets LLM inference on AWS Bedrock. Specs: 5nm, 2 NeuronCores v3, 128GB HBM3, optimized for low-latency decode. Used primarily inside AWS Bedrock · customers never directly provision Inferentia instances (usually) · they consume model endpoints backed by Inferentia.

Level 2

Inferentia 3's design prioritizes low cost per token over peak throughput. AWS claims 60% lower cost per token vs equivalent H100 deployments for Llama 3 70B-class models. Shipping primarily inside Bedrock · end-users see it via cheaper Bedrock pricing for "optimized" model SKUs. Software: Neuron SDK, integrates with vLLM and TensorRT-LLM analogues in the Neuron ecosystem.

Level 3

Inferentia 3's compute-to-memory ratio is tuned for autoregressive decode (memory-bandwidth bound, not compute). 128GB HBM3 per chip allows 70B-class models on single-chip serving with 8-bit quantization. NeuronCores v3 add better support for sparse attention and grouped-query attention · reflecting 2024 LLM architecture trends. Not available outside AWS · strategic lock-in.

The takeaway for you
If you are a
Researcher
  • ·AWS 3rd-gen inference chip · 5nm
  • ·128GB HBM3 per chip
  • ·Optimized for autoregressive decode
If you are a
Builder
  • ·Consumed via Bedrock · rarely directly provisioned
  • ·AWS Bedrock uses Inferentia 3 for "optimized" SKUs
  • ·Lower per-token price vs equivalent H100 Bedrock
If you are a
Investor
  • ·AWS margin lever on Bedrock
  • ·Hardware-level differentiation vs Azure/GCP Bedrock equivalents
  • ·Anthropic, Meta, Cohere all use Bedrock · indirect Inferentia 3 exposure
If you are a
Curious · Normie
  • ·Amazon's inference chip · runs AI models on AWS
  • ·Makes AI serving cheaper on AWS Bedrock
  • ·Customers don't see it directly
Gecko's take

Inferentia 3 is the quiet margin lever on Bedrock · most customers don't know they're using it.

Via AWS Bedrock "optimized" model SKUs · AWS provisions the hardware behind the scenes. Not directly accessible to most customers.