Home / Research Library / Decoupled Knowledge Distillation
🤖 Artificial Intelligence OpenAlex

Decoupled Knowledge Distillation

📅 June 1, 2022 👤 Borui Zhao, Quan Cui, Renjie Song et al. 📖 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 📊 854 citations

🤖 Plain-English Summary

advanced distillation methods are mainly based on distilling deep features from intermediate layers, while the significance of logit distillation is greatly overlooked. This paper proves the great potential of logit distillation, and we hope it will be helpful for future research.

🔑 Key Findings

  • To provide a novel viewpoint to study logit distillation, we re-formulate the classical KD loss into two parts, i.e., target class knowledge distillation (TCKD) and non-target class knowledge distillation (NCKD).
  • We empirically investigate and prove the effects of the two parts: TCKD transfers knowledge concerning the “difficulty” of training samples, while NCKD is the prominent reason why logit distillation works.
  • More importantly, we reveal that the classical KD loss is a coupled formulation, which (1) suppresses the effectiveness of NCKD and (2) limits the flexibility to balance these two parts.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Jun 01, 2022
Journal 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Authors Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, Jiajun Liang
DOI 10.1109/cvpr52688.2022.01165
Citations 854
Source OpenAlex

More 🤖 Artificial Intelligence Research