InternImage: Exploring Large-Scale Vision Foundation Models...

🤖 Plain-English Summary

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, andADE20K.

🔑 Key Findings

This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs.
Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information.
As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📜 Copyright Notice: This page shows only metadata (title, authors, journal, date) and an original AI-generated summary. No abstract or full article text is copied. The original research is the intellectual property of its authors and publisher. ScienceTrace does not reproduce copyrighted content.

← More Artificial Intelligence All Research Articles

📋 Article Details

Category	🤖 Artificial Intelligence
Published	Jun 01, 2023
Journal	Research Journal
Authors	Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li
DOI	10.1109/cvpr52729.2023.01385
Citations	894
Source	OpenAlex

🗂️ Research Categories

🤖 Artificial Intelligence 🧬 Medicine & Biology ⚛️ Physics & Space Science ⚙️ Engineering & Technology ∑ Mathematics

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

🤖 Plain-English Summary

🔑 Key Findings

💡 Why This Matters

📋 Article Details

🗂️ Research Categories

🔗 Related Resources

More 🤖 Artificial Intelligence Research