Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. Moreover, the joint embedding space of CLIP enables language-guided image manipulations in a zero-shot fashion.
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Apr 13, 2022 |
| Journal | arXiv (Cornell University) |
| Authors | Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen |
| DOI | 10.48550/arxiv.2204.06125 |
| Citations | 2,283 |
| Source | OpenAlex |