This paper presents SimMIM, a simple framework for masked image modeling. We also leverage this approach to address the data-hungry issue faced by large-scale model training, that a 3B model (Swin V2-G) is successfully trained to achieve advanced accuracy on four representative vision benchmarks using 40× less labelled data than that in previous practice (JFT-3B).
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Jun 01, 2022 |
| Journal | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
| Authors | Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao |
| DOI | 10.1109/cvpr52688.2022.00943 |
| Citations | 1,146 |
| Source | OpenAlex |