The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. For example, our model outperforms Flamingo80B by 8.7% on zero-shot VQAv2 with 54x fewer trainable parameters.
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Jan 30, 2023 |
| Journal | arXiv (Cornell University) |
| Authors | Junnan Li, Dongxu Li, Silvio Savarese, Steven C. H. Hoi |
| DOI | 10.48550/arxiv.2301.12597 |
| Citations | 914 |
| Source | OpenAlex |