Home / Research Articles Hub / Swin Transformer V2: Scaling Up Capacity and Resol...
🤖 Artificial Intelligence OpenAlex

Swin Transformer V2: Scaling Up Capacity and Resolution

📅 Published: June 1, 2022 👤 Ze Liu, Han Hu, Yutong Lin et al. 📖 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 📊 2,188 citations
AI-Generated Summary

We present techniques for scaling Swin Transformer up to 3 billion parameters and making it capable of training with images of up to 1,536x1,536 resolution. Using these techniques and self-supervised pre-training, we suc-cessfully train a strong 3 billion Swin Transformer model and effectively transfer it to various vision tasks involving high-resolution images or windows, achieving the advanced accuracy on a variety of benchmarks.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 By scaling up capacity and resolution, Swin Transformer sets new records on four representative vision benchmarks: 84.0% top-1 accuracy on ImageNet- V2 image classification, 63.1 / 54.4 box / mask mAP on COCO object detection, 59.9 mIoU on ADE20K semantic segmentation, and 86.8% top-1 accuracy on Kinetics-400 video action classification.
  • 2 We tackle issues of training instability, and study how to effectively transfer models pre-trained at low resolutions to higher resolution ones.
  • 3 To this aim, several novel technologies are proposed: 1) a residual post normalization technique and a scaled cosine attention approach to improve the stability of large vision models; 2) a log-spaced continuous position bias technique to effectively transfer models pre-trained at low-resolution images and windows to their higher-resolution counterparts.
Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category 🤖 Artificial Intelligence
Published Jun 1, 2022
Journal 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
DOI 10.1109/cvpr52688.2022.01170
Citations 2,188
Authors Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie