Knowledge distillation transfers what a large model knows into a compact one by training the student on the teacher's outputs. The result runs cheaper and faster on edge hardware, which is why distilled models are common in real-time video features.