Home / Research Articles Hub / CrossViT: Cross-Attention Multi-Scale Vision Trans...
🤖 Artificial Intelligence OpenAlex

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

📅 Published: October 1, 2021 👤 Chun-Fu Richard Chen, Quanfu Fan, Rameswar Panda 📖 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 📊 1,938 citations
AI-Generated Summary

The recently developed vision transformer (ViT) has achieved promising results on image classification compared to convolutional neural networks. For example, on the ImageNet1K dataset, with some architectural changes, our approach outperforms the recent DeiT by a large margin of 2% with a small to moderate increase in FLOPs and model parameters.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 Inspired by this, in this paper, we study how to learn multi-scale feature representations in transformer models for image classification.
  • 2 To this end, we propose a dual-branch transformer to com-bine image patches (i.e., tokens in a transformer) of different sizes to produce stronger image features.
  • 3 Our approach processes small-patch and large-patch tokens with two separate branches of different computational complexity and these tokens are then fused purely by attention multiple times to complement each other.
Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category 🤖 Artificial Intelligence
Published Oct 1, 2021
Journal 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
DOI 10.1109/iccv48922.2021.00041
Citations 1,938
Authors Chun-Fu Richard Chen, Quanfu Fan, Rameswar Panda