Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image. In particular, our CMT-S achieves 83.5% top-1 accuracy on ImageNet, while being 14x and 2x smaller on FLOPs than the existing DeiT and EfficientNet, respectively.
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Jun 01, 2022 |
| Journal | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
| Authors | Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Xinghao Chen |
| DOI | 10.1109/cvpr52688.2022.01186 |
| Citations | 850 |
| Source | OpenAlex |