Fine-tuning adapts a pre-trained model to a specific domain or task by continuing gradient updates on a smaller dataset.
Fine-Tuning Flavors
Resource Comparison (7 B model, 1 epoch)
Design Trade-offs
- Full fine-tune maximizes quality but risks catastrophic forgetting.
- LoRA lowers cost but adds runtime adapters.
- Prompt-tuning keeps base weights frozen; ideal for on-device personas.
Current Trends (2025)
- FP8 optimizer states cut memory 40 %.
- Alignment-aware fine-tuning adds reward-model logits as auxiliary loss.
- LoRA v3 introduces merged adapters that remove additional matmuls at inference.
Implementation Tips
- Start with LoRA rank 8; increase if eval drops >2 points.
- Evaluate on out-of-domain safety prompts to catch regressions.
- Distribute only diff checkpoints to respect restrictive open-weight licenses.