Audio-visual (AV)-automatic speech recognition (ASR) can improve speech recognition accuracy by using lip images, especially in noisy environments.The recently proposed AV Align system integrates speech and image features based on a cross-modal attention mechanism, where attention weights for visual features are estimated by using acoustic features as queries.Although AV Align shows an improvement in recognition accuracy in background noise environments, we have observed that the recognition acc...
This work deepens our understanding of the fundamental laws governing the universe, from subatomic particles to cosmic structures.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | ⚛️ Physics & Space Science |
| Published | Jan 01, 2022 |
| Journal | Research Journal |
| DOI | 10.1109/icassp43922.2022 |
| Citations | 925 |
| Source | OpenAlex |