Model Distillation

Benched.ai Editorial Team

Model distillation transfers knowledge from a large teacher model to a smaller student, achieving faster inference with minimal quality loss.

Distillation Types

Type	Teacher signals	Student size
Logit matching	Soft probabilities	10–30 % of teacher
Sequence-level	Teacher-generated text	30–50 %
Reinforcement distill	Reward signals	Variable

Quality vs Size (GPT-4 → 6B)

Student Params	Task F1 delta
13 B	−1 %
7 B	−3 %
3 B	−6 %

Current Trends (2025)

Data-free distillation uses synthetic prompts from teacher.
Multi-teacher ensembles improve robustness.
Distill + quantize stacks for mobile deployments.

Implementation Tips

Use temperature 2–4 when matching logits for smoother gradients.
Blend hard labels and soft logits to stabilize training.
Evaluate on safety benchmarks to ensure no regression.