Home / Research Library / Swin Transformer V2: Scaling Up Capacity and Resol...
🤖 Artificial Intelligence OpenAlex

Swin Transformer V2: Scaling Up Capacity and Resolution

📅 June 1, 2022 👤 Ze Liu, Han Hu, Yutong Lin et al. 📖 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 📊 2,188 citations

🤖 Plain-English Summary

We present techniques for scaling Swin Transformer up to 3 billion parameters and making it capable of training with images of up to 1,536x1,536 resolution. Using these techniques and self-supervised pre-training, we suc-cessfully train a strong 3 billion Swin Transformer model and effectively transfer it to various vision tasks involving high-resolution images or windows, achieving the advanced accuracy on a variety of benchmarks.

🔑 Key Findings

  • By scaling up capacity and resolution, Swin Transformer sets new records on four representative vision benchmarks: 84.0% top-1 accuracy on ImageNet- V2 image classification, 63.1 / 54.4 box / mask mAP on COCO object detection, 59.9 mIoU on ADE20K semantic segmentation, and 86.8% top-1 accuracy on Kinetics-400 video action classification.
  • We tackle issues of training instability, and study how to effectively transfer models pre-trained at low resolutions to higher resolution ones.
  • To this aim, several novel technologies are proposed: 1) a residual post normalization technique and a scaled cosine attention approach to improve the stability of large vision models; 2) a log-spaced continuous position bias technique to effectively transfer models pre-trained at low-resolution images and windows to their higher-resolution counterparts.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Jun 01, 2022
Journal 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Authors Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie
DOI 10.1109/cvpr52688.2022.01170
Citations 2,188
Source OpenAlex

More 🤖 Artificial Intelligence Research