Home / Research Library / Contextual Transformer Networks for Visual Recogni...
🤖 Artificial Intelligence OpenAlex

Contextual Transformer Networks for Visual Recognition

📅 April 1, 2022 👤 Yehao Li, Ting Yao, Yingwei Pan et al. 📖 IEEE Transactions on Pattern Analysis and Machine Intelligence 📊 698 citations

🤖 Plain-English Summary

Transformer with self-attention has led to the revolutionizing of natural language processing field, and recently inspires the emergence of Transformer-style architecture design with competitive results in numerous computer vision tasks. Through extensive experiments over a wide range of applications (e.g., image recognition, object detection, instance segmentation, and semantic segmentation), we validate the superiority of CoTNet as a stronger backbone.

🔑 Key Findings

  • Nevertheless, most of existing designs directly employ self-attention over a 2D feature map to obtain the attention matrix based on pairs of isolated queries and keys at each spatial location, but leave the rich contexts among neighbor keys under-exploited.
  • In this work, we design a novel Transformer-style module, i.e., Contextual Transformer (CoT) block, for visual recognition.
  • Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Apr 01, 2022
Journal IEEE Transactions on Pattern Analysis and Machine Intelligence
Authors Yehao Li, Ting Yao, Yingwei Pan, Tao Mei
DOI 10.1109/tpami.2022.3164083
Citations 698
Source OpenAlex

More 🤖 Artificial Intelligence Research