By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve advanced synthesis results on image data and beyond. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner.
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Jun 01, 2022 |
| Journal | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
| Authors | Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer |
| DOI | 10.1109/cvpr52688.2022.01042 |
| Citations | 13,557 |
| Source | OpenAlex |