We explore a new class of diffusion models based on the transformer architecture. We find that DiTs with higher Gflops—through increased transformer depth/width or increased number of input tokens—consistently have lower FID.
⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.
These innovations can translate to real-world improvements in technology, infrastructure, and everyday tools.
This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:
Read Full Paper at OpenAlex