[[attention mechanism]]

📚 node [[attention mechanism]]

📓 garden/KGBicheno/Artificial Intelligence/Introduction to AI/Week 3 - Introduction/Definitions/Attention Mechanism.md by @KGBicheno

attention mechanism

Go back to [[Week 2 - Introduction]] or [[Deep Learning]]

The attention mechanism is the part of a deep learning algorithm which determines the length of an input stream the algorithm will account for in current time-steps of its execution.

This becomes especially important in contexts like audio processing and language processing.

Different approaches include [[Long short-term memory - LSTM]] cells or [[Gated Recurrent Unit]] cells. GRUs are similar to LSTMs but have a forget gate and fewer parameters as they lack an output gate.

For example, BERT uses a full self-attention mechanism, meaning it applies its attention mechanism to the entire input, while

📖 stoas

public document at doc.anagora.org/attention-mechanism
video call at meet.jit.si/attention-mechanism

⥱ context

← back
gated recurrent unit
long short term memory lstm

↑ pushing here
(none)

↓ pulling this
(none)

→ forward
deep learning
gated recurrent unit
long short term memory lstm
week 2 introduction

🔎 full text search for 'attention mechanism'