Attention Is All You Need
Authors: Vaswani et al. (Google Brain/Research)
Year: 2017
Conference: NIPS 2017
Links:
Crux — My Take
My Take
The Transformer architecture brought a paradigm shift by replacing recurrence and convolutions with attention mechanisms. This not only improved translation quality but also made the model significantly more parallelizable, resulting in much shorter training times compared to legacy recurrent and convolutional approaches.
Why I Picked This
Reason
This groundbreaking study showed that attention alone is sufficient for sequence modeling, setting the stage for the entire modern NLP/NLG landscape.