Llama
Meta's open-weight model family · the most-deployed open LLM on earth, with 1B+ downloads.
Meta's open-weight model family · the most-deployed open LLM on earth, with 1B+ downloads.
Basic
Llama (Large Language Model Meta AI) has powered the open-weight AI ecosystem since 2023. Versions: Llama 1 (Feb 2023), Llama 2 (Jul 2023), Llama 3 (Apr 2024), Llama 3.1 (Jul 2024), Llama 4 (Apr 2025). Meta ships the weights under a custom license (commercial use allowed with some restrictions). The open release has made Llama the foundation layer for thousands of downstream models and products.
Deep
Llama architecture: decoder-only transformer with RoPE positional encoding, SwiGLU activation, RMSNorm. Modern variants (Llama 3.1+) use Grouped-Query Attention and large vocabularies (128K). Llama 4 is an MoE architecture with 10M token context (experimental). Meta trains on their own infrastructure · 24K H100 clusters for Llama 3, 70K+ for Llama 4. Tier structure: 8B (mobile/edge), 70B (production workhorse), 405B+ (frontier). The ecosystem includes Llama Guard (safety), Code Llama (specialized), and numerous third-party fine-tunes (Dolphin, Hermes, OpenChat, etc.).
Expert
Llama 3 architecture details (publicly disclosed): 8B has 32 layers, 4096 hidden, 32 attention heads, 8 KV heads (GQA), 128K vocab with tiktoken-like tokenizer, 8K context (extended to 128K via RoPE scaling in 3.1). 70B has 80 layers, 8192 hidden. 405B has 126 layers, 16K hidden. Training: 15T tokens for 3.1, estimated 30T+ for Llama 4. Post-training: SFT + DPO + RLHF blend, with iterative quality filtering. License: "Llama Community License" · commercial use allowed with some restrictions around 700M+ monthly active users and model-output-usage reporting. The open release has had outsized impact on research and deployment.
Llama 4 10M context + MoE architecture closed most of the gap to closed frontier models · open-weight deployment accelerated in 2026.
Depending on why you're here
- ·RoPE, SwiGLU, RMSNorm, GQA · the open-source transformer stack
- ·Training scale 15-30T tokens · estimated 5-10% of GPT-5 class compute
- ·Meta's release cadence drives open-source research tempo
- ·Llama 3.1 70B for self-hosted production · Llama 4 MoE if you can afford the VRAM
- ·Quantized variants (GGUF, AWQ) run on consumer hardware
- ·Fine-tune via LoRA · massive community of ready-made adapters
- ·Llama's open release commoditizes the base-model layer
- ·Benefits: Meta retains research talent, accelerates internal AI product launches
- ·Downside: reduces closed-model API pricing power long-term
- ·Meta (Facebook)'s AI · you can download and run it yourself
- ·Powers open-source AI apps all over the internet
- ·Free to use with some conditions
Llama is the reason open-source AI isn't stuck at 2023-level quality. Meta's bet pays off every time a researcher ships on top of it.