We present techniques for scaling Swin Transformer up to 3 billion parameters and making it capable of training with images of up to 1,536x1,536 resolution. Using these techniques and self-supervised pre-training, we suc-cessfully train a strong 3 billion Swin Transformer model and effectively transfer it to various vision tasks involving high-resolution images or windows, achieving the advanced accuracy on a variety of benchmarks.
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Jun 01, 2022 |
| Journal | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
| Authors | Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie |
| DOI | 10.1109/cvpr52688.2022.01170 |
| Citations | 2,188 |
| Source | OpenAlex |