Home / Research Articles Hub / ImageBind One Embedding Space to Bind Them All
🤖 Artificial Intelligence OpenAlex

ImageBind One Embedding Space to Bind Them All

📅 Published: June 1, 2023 👤 Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu et al. 📖 Research Journal 📊 701 citations
AI-Generated Summary

We present ImageBind, an approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. The emergent capabilities improve with the strength of the image encoder and we set a new advanced on emergent zero-shot recognition tasks across modalities, outperforming specialist supervised models.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together.
  • 2 ImageBind can leverage recent large scale vision-language models, and extends their zero-shot capabilities to new modalities just by using their natural pairing with images.
  • 3 It enables novel emergent applications ‘out-of-the-box’ including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation.
Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category 🤖 Artificial Intelligence
Published Jun 1, 2023
Journal Research Journal
DOI 10.1109/cvpr52729.2023.01457
Citations 701
Authors Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala