Home / Research Library / Conformer: Local Features Coupling Global Represen...
🤖 Artificial Intelligence OpenAlex

Conformer: Local Features Coupling Global Representations for Visual Recognition

📅 October 1, 2021 👤 Zhiliang Peng, Wei Huang, Shanzhi Gu et al. 📖 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 📊 781 citations

🤖 Plain-English Summary

Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. On MSCOCO, it outperforms ResNet-101 by 3.7% and 3.6% mAPs for object detection and instance segmentation, respectively, demonstrating the great potential to be a general backbone network.

🔑 Key Findings

  • Within visual transformer, the cascaded self-attention modules can capture long-distance feature dependencies but unfortunately deteriorate local feature details.
  • In this paper, we propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
  • Conformer roots in the Feature Coupling Unit (FCU), which fuses local features and global representations under different resolutions in an interactive fashion.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Oct 01, 2021
Journal 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
Authors Zhiliang Peng, Wei Huang, Shanzhi Gu, Lingxi Xie, Yaowei Wang
DOI 10.1109/iccv48922.2021.00042
Citations 781
Source OpenAlex

More 🤖 Artificial Intelligence Research