Home / Research Articles Hub / Adding Conditional Control to Text-to-Image Diffus...
🤖 Artificial Intelligence OpenAlex

Adding Conditional Control to Text-to-Image Diffusion Models

📅 Published: October 1, 2023 👤 Lvmin Zhang, Anyi Rao, Maneesh Agrawala 📖 Research Journal 📊 3,568 citations
AI-Generated Summary

We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls.
  • 2 The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning.
  • 3 We test various conditioning controls, e.g., edges, depth, segmentation, human pose, etc., with Stable Diffusion, using single or multiple conditions, with or without prompts.
Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category 🤖 Artificial Intelligence
Published Oct 1, 2023
Journal Research Journal
DOI 10.1109/iccv51070.2023.00355
Citations 3,568
Authors Lvmin Zhang, Anyi Rao, Maneesh Agrawala