Home / Research Library / PVT v2: Improved baselines with pyramid vision tra...
🤖 Artificial Intelligence OpenAlex

PVT v2: Improved baselines with pyramid vision transformer

📅 March 16, 2022 👤 Wenhai Wang, Enze Xie, Xiang Li et al. 📖 Computational Visual Media 📊 2,169 citations

🤖 Plain-English Summary

Transformers have recently lead to encouraging progress in computer vision. We hope this work will facilitate advanced transformer research in computer vision.

🔑 Key Findings

  • In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs: (i) a linear complexity attention layer, (ii) an overlapping patch embedding, and (iii) a convolutional feed-forward network.
  • With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linearity and provides significant improvements on fundamental vision tasks such as classification, detection, and segmentation.
  • In particular, PVT v2 achieves comparable or better performance than recent work such as the Swin transformer.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Mar 16, 2022
Journal Computational Visual Media
Authors Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song
DOI 10.1007/s41095-022-0274-8
Citations 2,169
Source OpenAlex

More 🤖 Artificial Intelligence Research