Home / Research Library / Vision-Language Models for Vision Tasks: A Survey
🤖 Artificial Intelligence OpenAlex

Vision-Language Models for Vision Tasks: A Survey

📅 February 26, 2024 👤 J Zhang, Jiaxing Huang, Sheng Jin et al. 📖 IEEE Transactions on Pattern Analysis and Machine Intelligence 📊 720 citations

🤖 Plain-English Summary

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm. To address the two challenges, Vision-Language Models (VLMs) have been intensively investigated recently, which learns rich vision-language correlation from web-scale image-text pairs that are almost infinitely available on the Internet and enables z...

🔑 Key Findings

  • To address the two challenges, Vision-Language Models (VLMs) have been intensively investigated recently, which learns rich vision-language correlation from web-scale image-text pairs that are almost infinitely available on the Internet and enables zero-shot predictions on various visual recognition tasks with a single VLM.
  • This paper provides a systematic review of visual language models for various visual recognition tasks, including: (1) the background that introduces the development of visual recognition paradigms; (2) the foundations of VLM that summarize the widely-adopted network architectures, pre-training objectives, and downstream tasks; (3) the widely-adopted datasets in VLM pre-training and evaluations; (4) the review and categorization of existing VLM pre-training methods, VLM transfer learning methods, and VLM knowledge distillation methods; (5) the benchmarking, analysis and discussion of the reviewed methods; (6) several research challenges and potential research directions that could be pursued in the future VLM studies for visual recognition.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Feb 26, 2024
Journal IEEE Transactions on Pattern Analysis and Machine Intelligence
Authors J Zhang, Jiaxing Huang, Sheng Jin, Shijian Lu
DOI 10.1109/tpami.2024.3369699
Citations 720
Source OpenAlex

More 🤖 Artificial Intelligence Research