Attention-based neural networks such as the Vision Transformer (ViT) have recently attained advanced results on many computer vision benchmarks. As a result, we successfully train a ViT model with two billion parameters, which attains a new advanced on ImageNet of 90.45% top-1 accuracy.
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Jun 01, 2022 |
| Journal | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
| Authors | Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Lucas Beyer |
| DOI | 10.1109/cvpr52688.2022.01179 |
| Citations | 783 |
| Source | OpenAlex |