Fine-tuning
Training a base model on a smaller, specific dataset to teach it a new skill or voice.
Training a base model on a smaller, specific dataset to teach it a new skill or voice.
Basic
Fine-tuning takes a pre-trained foundation model and continues training it on task-specific data. Use cases: customer-support tone, domain knowledge (legal, medical), structured output formats, specific language dialects. Methods range from full fine-tuning (update all parameters, expensive) to LoRA or QLoRA (update only a small adapter, cheap). Most production fine-tuning uses LoRA in 2026.
Deep
Fine-tuning recipes: Supervised Fine-Tuning (SFT) on labeled demonstrations · the most common. Direct Preference Optimization (DPO) on preference pairs · preferred over RLHF for alignment because it skips reward modeling. RLHF still used for safety-critical alignment. LoRA (Low-Rank Adaptation) trains a small adapter matrix while freezing base weights · 10-100× cheaper than full fine-tune, recoverable to base. QLoRA adds 4-bit quantization to LoRA for consumer-GPU training. Cost: a 7B model full fine-tune ~$5-20K on rented H100s; same workload with LoRA ~$50-500. Fine-tuning is reversible by discarding the adapter.
Expert
LoRA approximates the fine-tune delta ΔW as BA where B ∈ R^{d×r} and A ∈ R^{r×k} with r << min(d, k). Typical rank r = 8-64. Adapter parameters count at LoRA rank 16: <0.1% of full weight count. QLoRA adds NF4 quantization of the frozen base to run fine-tuning on single-GPU setups · degrades quality ~1-2 points. DPO loss: -log σ(β * (log π_θ(y_w|x) - log π_θ(y_l|x) - log π_ref(y_w|x) + log π_ref(y_l|x))). Catastrophic forgetting is the main risk · mitigate with low learning rates (1e-5), limited epochs (1-3), and mixing general-purpose data into the training set.
Depending on why you're here
- ·SFT, DPO, RLHF are the three common recipes
- ·LoRA/QLoRA for parameter-efficient tuning
- ·Catastrophic forgetting is the main pitfall
- ·Start with LoRA · cheapest, fastest, reversible
- ·Need ~1K-10K high-quality examples for most tasks
- ·Fine-tune when RAG can't solve it · tone, format, or latency-critical knowledge
- ·Fine-tuning infra (Together, Replicate, Hugging Face) is commoditizing fast
- ·Real moat is in domain-specific datasets and tuning recipes
- ·Enterprise fine-tuning revenue concentrates at closed-model providers
- ·Teaching a smart AI a new trick without retraining it from scratch
- ·Cheaper than building your own AI
- ·Turns a general-purpose model into a specialist
Often confused with
RAG injects context at query time · no weight changes. Fine-tuning changes weights. RAG for fast-changing knowledge, fine-tuning for stable patterns.
Prompt engineering changes inputs. Fine-tuning changes the model itself. Prompts are free, fine-tuning costs money but yields more reliable behavior.
Most teams reaching for fine-tuning should reach for better RAG first. When RAG isn't enough, LoRA before full fine-tune · always.
Frequently Asked Questions
Read the primary sources
- LoRA paper (Microsoft, 2021)arxiv.org
- DPO paper (Stanford, 2023)arxiv.org
- QLoRA paper (UW, 2023)arxiv.org