Home / Research Library / AST: Audio Spectrogram Transformer
⚙️ Engineering & Technology OpenAlex

AST: Audio Spectrogram Transformer

📅 August 27, 2021 👤 Yuan Gong, Yu-An Chung, James Glass 📖 Research Journal 📊 977 citations

🤖 Plain-English Summary

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for endto-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.To better capture long-range global context, a recent trend is to add a self-attention mechanism on top of the CNN, forming a CNN-attention hybrid model.However, it is unclear whether the reliance on a CNN is necessary, and if neural networks purely based on atten...

🔑 Key Findings

  • Research demonstrates significant advances in system performance metrics
  • Study provides new evidence regarding design optimization results
  • Findings open new directions for implementation feasibility

💡 Why This Matters

These innovations can translate to real-world improvements in technology, infrastructure, and everyday tools.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category ⚙️ Engineering & Technology
Published Aug 27, 2021
Journal Research Journal
Authors Yuan Gong, Yu-An Chung, James Glass
DOI 10.21437/interspeech.2021-698
Citations 977
Source OpenAlex

More ⚙️ Engineering & Technology Research