CSWin Transformer: A General Vision Transformer Backbone wit...

🤖 Plain-English Summary

We present CSWin Transformer, an efficient and effective Transformer-based backbone for general-purpose vision tasks. By further pretraining on the larger dataset ImageNet-21K, we achieve 87.5% Top-1 accuracy on ImageNet-1K and high segmentation performance on ADE20K with 55.7 mIoU.

🔑 Key Findings

A challenging issue in Transformer design is that global self-attention is very expensive to compute whereas local self-attention often limits the field of interactions of each token.
To address this issue, we develop the Cross-Shaped Window self-attention mechanism for computing self-attention in the horizontal and vertical stripes in parallel that form a cross-shaped window, with each stripe obtained by splitting the input feature into stripes of equal width.
We provide a mathematical analysis of the effect of the stripe width and vary the stripe width for different layers of the Transformer network which achieves strong modeling capability while limiting the computation cost.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📜 Copyright Notice: This page shows only metadata (title, authors, journal, date) and an original AI-generated summary. No abstract or full article text is copied. The original research is the intellectual property of its authors and publisher. ScienceTrace does not reproduce copyrighted content.

← More Artificial Intelligence All Research Articles

📋 Article Details

Category	🤖 Artificial Intelligence
Published	Jun 01, 2022
Journal	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Authors	Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu
DOI	10.1109/cvpr52688.2022.01181
Citations	1,228
Source	OpenAlex

🗂️ Research Categories

🤖 Artificial Intelligence 🧬 Medicine & Biology ⚛️ Physics & Space Science ⚙️ Engineering & Technology ∑ Mathematics

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

🤖 Plain-English Summary

🔑 Key Findings

💡 Why This Matters

📋 Article Details

🗂️ Research Categories

🔗 Related Resources

More 🤖 Artificial Intelligence Research