Home / Research Library / Swin Transformer: Hierarchical Vision Transformer...
🤖 Artificial Intelligence OpenAlex

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

📅 October 1, 2021 👤 Ze Liu, Yutong Lin, Yue Cao et al. 📖 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 📊 29,920 citations

🤖 Plain-English Summary

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. The hierarchical design and the shifted window approach also prove beneficial for all-MLP architectures.

🔑 Key Findings

  • Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.
  • To address these differences, we propose a hierarchical Transformer whose representation is computed with Shifted windows.
  • The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Oct 01, 2021
Journal 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
Authors Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei
DOI 10.1109/iccv48922.2021.00986
Citations 29,920
Source OpenAlex

More 🤖 Artificial Intelligence Research