MaPLe: Multi-modal Prompt Learning

AI-Generated Summary

Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. Compared with the advanced method Co-CoOp, MaPLe exhibits favorable performance and achieves an absolute gain of 3.45% on novel classes and 2.72% on overall harmonic-mean, averaged over 11 diverse image recognition datasets.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings

1 However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well.
2 Inspired by the Natural Language Processing (NLP) literature, recent CLIP adaptation approaches learn prompts as the textual inputs to fine-tune CLIP for downstream tasks.
3 We note that using prompting to adapt representations in a single branch of CLIP (language or vision) is sub-optimal since it does not allow the flexibility to dynamically adjust both representation spaces on a downstream task.

Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex

More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub

Article Details

Source	OpenAlex
Category	🤖 Artificial Intelligence
Published	Jun 1, 2023
Journal	Research Journal
DOI	10.1109/cvpr52729.2023.01832
Citations	727
Authors	Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan