Parameter-efficient tuning (PET) adapts a large model by training a small subset of parameters—adapters, biases, LoRA ranks—reducing compute and storage.
PET Methods
Memory Savings (7B model)
Design Trade-offs
- Lower params = cheaper but may cap quality.
- PET requires merging adapters at inference or extra matmuls.
- Some licenses disallow weight merging.
Current Trends (2025)
- QLoRA + LoRA stack reduces VRAM further with 4-bit base weights.
- Adapter fusion merges multiple task adapters on-the-fly.
- PEFT libraries standardize APIs across frameworks.
Implementation Tips
- Start with LoRA rank 8; adjust after eval.
- Use 2-stage tuning: LoRA then bias-only for final polish.
- Store adapters separately for lightweight distribution.