Our objective in this work is video-text retrieval – in particular a joint embedding that enables efficient text-to-video retrieval. We also provide a new video-text pretraining dataset WebVid-2M, comprised of over two million videos with weak captions scraped from the internet.
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Jan 01, 2022 |
| Journal | Oxford University Research Archive (ORA) (University of Oxford) |
| Authors | Zisserman, A, Arsha Nagrani, Gül Varol, Bain, M |
| Citations | 752 |
| Source | OpenAlex |