Knowledge distillation transfers what a large model knows into a compact one by training the student on the teacher's outputs. The result runs cheaper and faster on edge hardware, which is why distilled models are common in real-time video features.
Definition
Training a small 'student' model to copy a big 'teacher' model, so you get most of the quality at a fraction of the size, cost, and latency.
Knowledge distillation transfers what a large model knows into a compact one by training the student on the teacher's outputs. The result runs cheaper and faster on edge hardware, which is why distilled models are common in real-time video features.
Also known as
distillation, distil-whisper