Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. BLIP also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner.
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Jan 28, 2022 |
| Journal | arXiv (Cornell University) |
| Authors | Junnan Li, Dongxu Li, Caiming Xiong, Steven C. H. Hoi |
| DOI | 10.48550/arxiv.2201.12086 |
| Citations | 867 |
| Source | OpenAlex |