Home / Research Library / High-Resolution Image Synthesis with Latent Diffus...
🤖 Artificial Intelligence OpenAlex

High-Resolution Image Synthesis with Latent Diffusion Models

📅 June 1, 2022 👤 Robin Rombach, Andreas Blattmann, Dominik Lorenz et al. 📖 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 📊 13,557 citations

🤖 Plain-English Summary

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve advanced synthesis results on image data and beyond. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner.

🔑 Key Findings

  • Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining.
  • However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations.
  • To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Jun 01, 2022
Journal 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Authors Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
DOI 10.1109/cvpr52688.2022.01042
Citations 13,557
Source OpenAlex

More 🤖 Artificial Intelligence Research