Home / Research Library / MDETR - Modulated Detection for End-to-End Multi-M...
🤖 Artificial Intelligence OpenAlex

MDETR - Modulated Detection for End-to-End Multi-Modal Understanding

📅 October 1, 2021 👤 Aishwarya Kamath, Mannat Singh, Yann LeCun et al. 📖 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 📊 673 citations

🤖 Plain-English Summary

Multi-modal reasoning systems rely on a pre-trained object detector to extract regions of interest from the image. Our approach can be easily extended for visual question answering, achieving competitive performance on GQA and CLEVR.

🔑 Key Findings

  • However, this crucial module is typically used as a black box, trained independently of the downstream task and on a fixed vocabulary of objects and attributes.
  • This makes it challenging for such systems to capture the long tail of visual concepts expressed in free form text.
  • In this paper we propose MDETR, an end-to-end modulated detector that detects objects in an image conditioned on a raw text query, like a caption or a question.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Oct 01, 2021
Journal 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
Authors Aishwarya Kamath, Mannat Singh, Yann LeCun, Gabriel Synnaeve, Ishan Misra
DOI 10.1109/iccv48922.2021.00180
Citations 673
Source OpenAlex

More 🤖 Artificial Intelligence Research