U-Net has become an indispensable component in medical image segmentation tasks. The characteristic of U-Net is that it produces multi-scale features, multi-scale features can provide hidden features under different views, which helps improve semantic segmentation performance. In addition, knowledge distillation, e.g., feature distillation or logit distillation, is a mechanism that can efficiently compress models. Feature distillation guides students’ feature learning by transferring feature information. In order to be able to supervise and distill these multi-scale features in feature distillation, we propose a Multi-scale Feature Distillation (MFD). MFD uses the teacher's predicted logits as the distillation target, and the students' multi-scale features of different layer as the supervision target. Nowadays, it has become a trend to decouple logits distillation. Original logits distillation can usually be divided into target classes and non-target classes. Target classes and non-target classes often play different roles in feature distillation and logits distillation. We introduce a Decoupled Multi-scale Distillation (DMD) that utilize target classes and non-target classes for feature distillation and logits distillation. When performing feature distillation, we use non-target classes for distillation, and when performing logits distillation we use target classes for distillation. Experiments on different datasets demonstrate that the DMD is effective.
|