Home / Research Library / Multimodal Learning With Transformers: A Survey
🤖 Artificial Intelligence OpenAlex

Multimodal Learning With Transformers: A Survey

📅 May 11, 2023 👤 Peng Xu, Xiatian Zhu, David A. Clifton 📖 IEEE Transactions on Pattern Analysis and Machine Intelligence 📊 848 citations

🤖 Plain-English Summary

Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data.

🔑 Key Findings

  • Thanks to the recent prevalence of multimodal applications and Big Data, Transformer-based multimodal learning has become a hot topic in AI research.
  • This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data.
  • The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal Big Data era, (2) a systematic review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published May 11, 2023
Journal IEEE Transactions on Pattern Analysis and Machine Intelligence
Authors Peng Xu, Xiatian Zhu, David A. Clifton
DOI 10.1109/tpami.2023.3275156
Citations 848
Source OpenAlex

More 🤖 Artificial Intelligence Research