Home / Research Articles Hub / Pyramid Vision Transformer: A Versatile Backbone f...
🤖 Artificial Intelligence OpenAlex

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

📅 Published: October 1, 2021 👤 Wenhai Wang, Enze Xie, Xiang Li et al. 📖 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 📊 4,679 citations
AI-Generated Summary

Although convolutional neural networks (CNNs) have achieved great success in computer vision, this work investigates a simpler, convolution-free backbone network use-fid for many dense prediction tasks. For example, with a comparable number of parameters, PVT+RetinaNet achieves 40.4 AP on the COCO dataset, surpassing ResNet50+RetinNet (36.3 AP) by 4.1 absolute AP (see Figure 2).

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 Unlike the recently-proposed Vision Transformer (ViT) that was designed for image classification specifically, we introduce the Pyramid Vision Transformer (PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.
  • 2 PVT has several merits compared to current state of the arts.
  • 3 (1) Different from ViT that typically yields low-resolution outputs and incurs high computational and memory costs, PVT not only can be trained on dense partitions of an image to achieve high output resolution, which is important for dense prediction, but also uses a progressive shrinking pyramid to reduce the computations of large feature maps.
Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category 🤖 Artificial Intelligence
Published Oct 1, 2021
Journal 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
DOI 10.1109/iccv48922.2021.00061
Citations 4,679
Authors Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song