Command Palette

Search for a command to run...

Model Distillation

Benched.ai Editorial Team

Model distillation transfers knowledge from a large teacher model to a smaller student, achieving faster inference with minimal quality loss.

  Distillation Types

TypeTeacher signalsStudent size
Logit matchingSoft probabilities10–30 % of teacher
Sequence-levelTeacher-generated text30–50 %
Reinforcement distillReward signalsVariable

  Quality vs Size (GPT-4 → 6B)

Student ParamsTask F1 delta
13 B−1 %
7 B−3 %
3 B−6 %

  Current Trends (2025)

  • Data-free distillation uses synthetic prompts from teacher.
  • Multi-teacher ensembles improve robustness.
  • Distill + quantize stacks for mobile deployments.

  Implementation Tips

  1. Use temperature 2–4 when matching logits for smoother gradients.
  2. Blend hard labels and soft logits to stabilize training.
  3. Evaluate on safety benchmarks to ensure no regression.