Home / Research Library / Adding Conditional Control to Text-to-Image Diffus...
🤖 Artificial Intelligence OpenAlex

Adding Conditional Control to Text-to-Image Diffusion Models

📅 October 1, 2023 👤 Lvmin Zhang, Anyi Rao, Maneesh Agrawala 📖 Research Journal 📊 3,568 citations

🤖 Plain-English Summary

We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets.

🔑 Key Findings

  • ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls.
  • The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning.
  • We test various conditioning controls, e.g., edges, depth, segmentation, human pose, etc., with Stable Diffusion, using single or multiple conditions, with or without prompts.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Oct 01, 2023
Journal Research Journal
Authors Lvmin Zhang, Anyi Rao, Maneesh Agrawala
DOI 10.1109/iccv51070.2023.00355
Citations 3,568
Source OpenAlex

More 🤖 Artificial Intelligence Research