Beta
ConceptsReading · ~3 min · 88 words deep

Fine-tuning

Training a base model on a smaller, specific dataset to teach it a new skill or voice.

TL;DR

Training a base model on a smaller, specific dataset to teach it a new skill or voice.

Level 1

Fine-tuning takes a pre-trained foundation model and continues training it on task-specific data. Use cases: customer-support tone, domain knowledge (legal, medical), structured output formats, specific language dialects. Methods range from full fine-tuning (update all parameters, expensive) to LoRA or QLoRA (update only a small adapter, cheap). Most production fine-tuning uses LoRA in 2026.

Level 2

Fine-tuning recipes: Supervised Fine-Tuning (SFT) on labeled demonstrations · the most common. Direct Preference Optimization (DPO) on preference pairs · preferred over RLHF for alignment because it skips reward modeling. RLHF still used for safety-critical alignment. LoRA (Low-Rank Adaptation) trains a small adapter matrix while freezing base weights · 10-100× cheaper than full fine-tune, recoverable to base. QLoRA adds 4-bit quantization to LoRA for consumer-GPU training. Cost: a 7B model full fine-tune ~$5-20K on rented H100s; same workload with LoRA ~$50-500. Fine-tuning is reversible by discarding the adapter.

Level 3

LoRA approximates the fine-tune delta ΔW as BA where B ∈ R^{d×r} and A ∈ R^{r×k} with r << min(d, k). Typical rank r = 8-64. Adapter parameters count at LoRA rank 16: <0.1% of full weight count. QLoRA adds NF4 quantization of the frozen base to run fine-tuning on single-GPU setups · degrades quality ~1-2 points. DPO loss: -log σ(β * (log π_θ(y_w|x) - log π_θ(y_l|x) - log π_ref(y_w|x) + log π_ref(y_l|x))). Catastrophic forgetting is the main risk · mitigate with low learning rates (1e-5), limited epochs (1-3), and mixing general-purpose data into the training set.

The takeaway for you
If you are a
Researcher
  • ·SFT, DPO, RLHF are the three common recipes
  • ·LoRA/QLoRA for parameter-efficient tuning
  • ·Catastrophic forgetting is the main pitfall
If you are a
Builder
  • ·Start with LoRA · cheapest, fastest, reversible
  • ·Need ~1K-10K high-quality examples for most tasks
  • ·Fine-tune when RAG can't solve it · tone, format, or latency-critical knowledge
If you are a
Investor
  • ·Fine-tuning infra (Together, Replicate, Hugging Face) is commoditizing fast
  • ·Real moat is in domain-specific datasets and tuning recipes
  • ·Enterprise fine-tuning revenue concentrates at closed-model providers
If you are a
Curious · Normie
  • ·Teaching a smart AI a new trick without retraining it from scratch
  • ·Cheaper than building your own AI
  • ·Turns a general-purpose model into a specialist
Don't mix them up
Fine-tuningvsRAG

RAG injects context at query time · no weight changes. Fine-tuning changes weights. RAG for fast-changing knowledge, fine-tuning for stable patterns.

Fine-tuningvsPrompt engineering

Prompt engineering changes inputs. Fine-tuning changes the model itself. Prompts are free, fine-tuning costs money but yields more reliable behavior.

Gecko's take

Most teams reaching for fine-tuning should reach for better RAG first. When RAG isn't enough, LoRA before full fine-tune · always.

1,000-10,000 high-quality examples is the typical sweet spot. Quality > quantity. Poor data degrades the base model.
Canonical sources