Attention scores how much each input element should influence each output element, so the model focuses on the relevant words or image patches. It is what gives transformers their grasp of context, and its compute cost shapes how long inputs can be.
Definition
The mechanism that lets a model weigh which parts of the input matter most for each output. The core idea that makes transformers work.
Attention scores how much each input element should influence each output element, so the model focuses on the relevant words or image patches. It is what gives transformers their grasp of context, and its compute cost shapes how long inputs can be.
Also known as
self-attention, attention mechanism