Home / Research Library / MaPLe: Multi-modal Prompt Learning
🤖 Artificial Intelligence OpenAlex

MaPLe: Multi-modal Prompt Learning

📅 June 1, 2023 👤 Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz et al. 📖 Research Journal 📊 727 citations

🤖 Plain-English Summary

Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. Compared with the advanced method Co-CoOp, MaPLe exhibits favorable performance and achieves an absolute gain of 3.45% on novel classes and 2.72% on overall harmonic-mean, averaged over 11 diverse image recognition datasets.

🔑 Key Findings

  • However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well.
  • Inspired by the Natural Language Processing (NLP) literature, recent CLIP adaptation approaches learn prompts as the textual inputs to fine-tune CLIP for downstream tasks.
  • We note that using prompting to adapt representations in a single branch of CLIP (language or vision) is sub-optimal since it does not allow the flexibility to dynamically adjust both representation spaces on a downstream task.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Jun 01, 2023
Journal Research Journal
Authors Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
DOI 10.1109/cvpr52729.2023.01832
Citations 727
Source OpenAlex

More 🤖 Artificial Intelligence Research