Home / Research Articles Hub / MobileViT: Light-weight, General-purpose, and Mobi...
🤖 Artificial Intelligence OpenAlex

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision\n Transformer

📅 Published: October 5, 2021 👤 Sachin Mehta, Mohammad Rastegari 📖 arXiv (Cornell University) 📊 734 citations
AI-Generated Summary

Light-weight convolutional neural networks (CNNs) are the de-facto for mobile\nvision tasks. On the ImageNet-1k dataset,\nMobileViT achieves top-1 accuracy of 78.4% with about 6 million parameters,\nwhich is 3.2% and 6.2% more accurate than MobileNetv3 (CNN-based) and DeIT\n(ViT-based) for a similar number of parameters.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 Their spatial inductive biases allow them to learn\nrepresentations with fewer parameters across different vision tasks.
  • 2 However,\nthese networks are spatially local.
  • 3 To learn global representations,\nself-attention-based vision trans-formers (ViTs) have been adopted.
Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category 🤖 Artificial Intelligence
Published Oct 5, 2021
Journal arXiv (Cornell University)
DOI 10.48550/arxiv.2110.02178
Citations 734
Authors Sachin Mehta, Mohammad Rastegari