Home / Research Articles Hub / EfficientViT: Memory Efficient Vision Transformer...
⚛️ Physics & Space Science OpenAlex

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

📅 Published: June 1, 2023 👤 Xinyu Liu, Houwen Peng, Ningxin Zheng et al. 📖 Research Journal 📊 729 citations
AI-Generated Summary

Vision transformers have shown great success due to their high model capabilities. Compared to the recent efficient model MobileViT-XXS, EfficientViT-M2 achieves 1.8% superior accuracy, while running <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$5.8\times/3.7\times$</tex> faster on the GPU/CPU, and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$7.4\times faster$</tex> when converted to ONNX format.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 However, their remarkable performance is accompanied by heavy computation costs, which makes them unsuitable for real-time applications.
  • 2 In this paper, we propose a family of high-speed vision transformers named Efficient ViT.
  • 3 We find that the speed of existing transformer models is commonly bounded by memory inefficient operations, especially the tensor reshaping and element-wise functions in MHSA.
Why It Matters

This work deepens our understanding of the fundamental laws governing the universe, from subatomic particles to cosmic structures.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Physics & Space Science Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category ⚛️ Physics & Space Science
Published Jun 1, 2023
Journal Research Journal
DOI 10.1109/cvpr52729.2023.01386
Citations 729
Authors Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu