The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. On the WMT 2014 English-to-French translation task, our model establishes a new single-model advanced BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature.
This research advances how AI systems learn, reason, and solve problems — with direct implications for software, automation, and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Aug 23, 2025 |
| Journal | Research Journal |
| Authors | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones |
| DOI | 10.65215/2q58a426 |
| Citations | 6,562 |
| Source | OpenAlex |