The landmark 2017 paper introducing the Transformer architecture that powers modern AI. Replaced recurrence and convolutions with self-attention mechanisms, enabling massive parallelization and superior performance. This architecture is the foundation of GPT, BERT, and virtually every large language model. The most-cited AI paper of the decade.