Home / Research Library / ConViT: improving vision transformers with soft co...
🤖 Artificial Intelligence OpenAlex

ConViT: improving vision transformers with soft convolutional inductive biases*

📅 November 1, 2022 👤 Stéphane d’Ascoli, Hugo Touvron, Matthew L. Leavitt et al. 📖 Journal of Statistical Mechanics Theory and Experiment 📊 713 citations

🤖 Plain-English Summary

Abstract Convolutional architectures have proven to be extremely successful for vision tasks. We conclude by presenting various ablations to better understand the success of the ConViT.

🔑 Key Findings

  • Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling.
  • Vision transformers rely on more flexible self-attention layers, and have recently outperformed CNNs for image classification.
  • However, they require costly pre-training on large external datasets or distillation from pre-trained convolutional networks.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Nov 01, 2022
Journal Journal of Statistical Mechanics Theory and Experiment
Authors Stéphane d’Ascoli, Hugo Touvron, Matthew L. Leavitt, Ari S. Morcos, Giulio Biroli
DOI 10.1088/1742-5468/ac9830
Citations 713
Source OpenAlex

More 🤖 Artificial Intelligence Research