Home / Research Articles Hub / Decoupled Knowledge Distillation
🤖 Artificial Intelligence OpenAlex

Decoupled Knowledge Distillation

📅 Published: June 1, 2022 👤 Borui Zhao, Quan Cui, Renjie Song et al. 📖 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 📊 854 citations
AI-Generated Summary

advanced distillation methods are mainly based on distilling deep features from intermediate layers, while the significance of logit distillation is greatly overlooked. This paper proves the great potential of logit distillation, and we hope it will be helpful for future research.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 To provide a novel viewpoint to study logit distillation, we re-formulate the classical KD loss into two parts, i.e., target class knowledge distillation (TCKD) and non-target class knowledge distillation (NCKD).
  • 2 We empirically investigate and prove the effects of the two parts: TCKD transfers knowledge concerning the “difficulty” of training samples, while NCKD is the prominent reason why logit distillation works.
  • 3 More importantly, we reveal that the classical KD loss is a coupled formulation, which (1) suppresses the effectiveness of NCKD and (2) limits the flexibility to balance these two parts.
Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category 🤖 Artificial Intelligence
Published Jun 1, 2022
Journal 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
DOI 10.1109/cvpr52688.2022.01165
Citations 854
Authors Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, Jiajun Liang