Home / Research Library / EfficientViT: Memory Efficient Vision Transformer...
⚛️ Physics & Space Science OpenAlex

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

📅 June 1, 2023 👤 Xinyu Liu, Houwen Peng, Ningxin Zheng et al. 📖 Research Journal 📊 729 citations

🤖 Plain-English Summary

Vision transformers have shown great success due to their high model capabilities. Compared to the recent efficient model MobileViT-XXS, EfficientViT-M2 achieves 1.8% superior accuracy, while running <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$5.8\times/3.7\times$</tex> faster on the GPU/CPU, and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$7.4\times faster$</tex> when converted to ONNX format.

🔑 Key Findings

  • However, their remarkable performance is accompanied by heavy computation costs, which makes them unsuitable for real-time applications.
  • In this paper, we propose a family of high-speed vision transformers named Efficient ViT.
  • We find that the speed of existing transformer models is commonly bounded by memory inefficient operations, especially the tensor reshaping and element-wise functions in MHSA.

💡 Why This Matters

This work deepens our understanding of the fundamental laws governing the universe, from subatomic particles to cosmic structures.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category ⚛️ Physics & Space Science
Published Jun 01, 2023
Journal Research Journal
Authors Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu
DOI 10.1109/cvpr52729.2023.01386
Citations 729
Source OpenAlex

More ⚛️ Physics & Space Science Research