Light-weight convolutional neural networks (CNNs) are the de-facto for mobile\nvision tasks. On the ImageNet-1k dataset,\nMobileViT achieves top-1 accuracy of 78.4% with about 6 million parameters,\nwhich is 3.2% and 6.2% more accurate than MobileNetv3 (CNN-based) and DeIT\n(ViT-based) for a similar number of parameters.
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Oct 05, 2021 |
| Journal | arXiv (Cornell University) |
| Authors | Sachin Mehta, Mohammad Rastegari |
| DOI | 10.48550/arxiv.2110.02178 |
| Citations | 734 |
| Source | OpenAlex |