We present techniques for scaling Swin Transformer up to 3 billion parameters and making it capable of training with images of up to 1,536x1,536 resolution. Using these techniques and self-supervised pre-training, we suc-cessfully train a strong 3 billion Swin Transformer model and effectively transfer it to various vision tasks involving high-resolution images or windows, achieving the advanced accuracy on a variety of benchmarks.
⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:
Read Full Paper at OpenAlex