Adding Conditional Control to Text-to-Image Diffusion Models

AI-Generated Summary

We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings

1 ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls.
2 The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning.
3 We test various conditioning controls, e.g., edges, depth, segmentation, human pose, etc., with Stable Diffusion, using single or multiple conditions, with or without prompts.

Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex

More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub

Article Details

Source	OpenAlex
Category	🤖 Artificial Intelligence
Published	Oct 1, 2023
Journal	Research Journal
DOI	10.1109/iccv51070.2023.00355
Citations	3,568
Authors	Lvmin Zhang, Anyi Rao, Maneesh Agrawala