Home / Research Library / Scaling Vision Transformers
🤖 Artificial Intelligence OpenAlex

Scaling Vision Transformers

📅 June 1, 2022 👤 Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby et al. 📖 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 📊 783 citations

🤖 Plain-English Summary

Attention-based neural networks such as the Vision Transformer (ViT) have recently attained advanced results on many computer vision benchmarks. As a result, we successfully train a ViT model with two billion parameters, which attains a new advanced on ImageNet of 90.45% top-1 accuracy.

🔑 Key Findings

  • Scale is a primary ingredient in attaining excellent results, therefore, understanding a model's scaling properties is a key to designing future generations effectively.
  • While the laws for scaling Transformer language models have been studied, it is unknown how Vision Transformers scale.
  • To address this, we scale ViT models and data, both up and down, and characterize the relationships between error rate, data, and compute.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Jun 01, 2022
Journal 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Authors Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Lucas Beyer
DOI 10.1109/cvpr52688.2022.01179
Citations 783
Source OpenAlex

More 🤖 Artificial Intelligence Research