We explore a new class of diffusion models based on the transformer architecture. We find that DiTs with higher Gflops—through increased transformer depth/width or increased number of input tokens—consistently have lower FID.
These innovations can translate to real-world improvements in technology, infrastructure, and everyday tools.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | ⚙️ Engineering & Technology |
| Published | Oct 01, 2023 |
| Journal | Research Journal |
| Authors | William Peebles, Saining Xie |
| DOI | 10.1109/iccv51070.2023.00387 |
| Citations | 1,407 |
| Source | OpenAlex |