Home / Research Library / An Empirical Study of Training Self-Supervised Vis...
🤖 Artificial Intelligence OpenAlex

An Empirical Study of Training Self-Supervised Vision Transformers

📅 October 1, 2021 👤 Xinlei Chen, Saining Xie, Kaiming He 📖 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 📊 1,446 citations

🤖 Plain-English Summary

This paper does not describe a novel method. We discuss the currently positive evidence as well as challenges and open questions.

🔑 Key Findings

  • Instead, it studies a straightforward, incremental, yet must-know baseline given the recent progress in computer vision: self-supervised learning for Vision Transformers (ViT).
  • While the training recipes for standard convolutional networks have been highly mature and robust, the recipes for ViT are yet to be built, especially in the self-supervised scenarios where training becomes more challenging.
  • In this work, we go back to basics and investigate the effects of several fundamental components for training self-supervised ViT.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Oct 01, 2021
Journal 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
Authors Xinlei Chen, Saining Xie, Kaiming He
DOI 10.1109/iccv48922.2021.00950
Citations 1,446
Source OpenAlex

More 🤖 Artificial Intelligence Research