Home / Research Articles Hub / MaPLe: Multi-modal Prompt Learning
🤖 Artificial Intelligence OpenAlex

MaPLe: Multi-modal Prompt Learning

📅 Published: June 1, 2023 👤 Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz et al. 📖 Research Journal 📊 727 citations
AI-Generated Summary

Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. Compared with the advanced method Co-CoOp, MaPLe exhibits favorable performance and achieves an absolute gain of 3.45% on novel classes and 2.72% on overall harmonic-mean, averaged over 11 diverse image recognition datasets.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well.
  • 2 Inspired by the Natural Language Processing (NLP) literature, recent CLIP adaptation approaches learn prompts as the textual inputs to fine-tune CLIP for downstream tasks.
  • 3 We note that using prompting to adapt representations in a single branch of CLIP (language or vision) is sub-optimal since it does not allow the flexibility to dynamically adjust both representation spaces on a downstream task.
Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category 🤖 Artificial Intelligence
Published Jun 1, 2023
Journal Research Journal
DOI 10.1109/cvpr52729.2023.01832
Citations 727
Authors Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan