Home / Research Articles Hub / MViTv2: Improved Multiscale Vision Transformers fo...
🤖 Artificial Intelligence OpenAlex

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

📅 Published: June 1, 2022 👤 Yanghao Li, Chao-Yuan Wu, Haoqi Fan et al. 📖 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 📊 719 citations
AI-Generated Summary

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection. Without bells-and-whistles, MViTv2 has advanced performance in 3 domains: 88.8% accuracy on ImageNet classification, 58.7 AP <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">box</sup> on COCO object detection as well as 86.1% on Kinetics-400 video classification.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 We present an improved version of MViT that incorporates decomposed relative positional embeddings and residual pooling connections.
  • 2 We instantiate this architecture in five sizes and evaluate it for ImageNet classification, COCO detection and Kinetics video recognition where it outperforms prior work.
  • 3 We further compare MViTv2s' pooling attention to window attention mechanisms where it outperforms the latter in accuracy/compute.
Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category 🤖 Artificial Intelligence
Published Jun 1, 2022
Journal 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
DOI 10.1109/cvpr52688.2022.00476
Citations 719
Authors Yanghao Li, Chao-Yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong