Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm. To address the two challenges, Vision-Language Models (VLMs) have been intensively investigated recently, which learns rich vision-language correlation from web-scale image-text pairs that are almost infinitely available on the Internet and enables z...
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Feb 26, 2024 |
| Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
| Authors | J Zhang, Jiaxing Huang, Sheng Jin, Shijian Lu |
| DOI | 10.1109/tpami.2024.3369699 |
| Citations | 720 |
| Source | OpenAlex |