Gated Recurrent Unit - GRU

Go back to [[Week 2 - Introduction]] or the [[Main AI Page]] Part of the pages on [[Artificial Intelligence/Week 2/Natural Language Processing]] and [[Attention Mechanism]].

According to the Illustrated Guide to LSTMs and GRUs - A step-by-step guide

The GRU is the newer generation of Recurrent Neural networks and is pretty similar to an LSTM. GRU’s got rid of the cell state and used the hidden state to transfer information. It also only has two gates, a reset gate and update gate.

Though Wikipedia does mention that:

GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets.[6][7]

However, as shown by Gail Weiss, Yoav Goldberg and Eran Yahav, the LSTM is "strictly stronger" than the GRU as it can easily perform unbounded counting, while the GRU cannot. That's why the GRU fails to learn simple languages that are learnable by the LSTM.[8]

Similarly, as shown by Denny Britz, Anna Goldie, Minh-Thang Luong and Quoc Le of Google Brain, LSTM cells consistently outperform GRU cells in "the first large-scale analysis of architecture variations for Neural Machine Translation."[9]

A GRU and its gates

See my notes on [[Long short-term memory - LSTM]] for an in-depth guide on how these work.

An LSTM and a GRU side-by-side

📖 stoas

public document at doc.anagora.org/gated-recurrent-unit
video call at meet.jit.si/gated-recurrent-unit

⥱ context

← back
attention mechanism
main ai page
week 2 introduction

↑ pushing here
(none)

↓ pulling this
(none)

→ forward
artificial intelligence/week 2/natural language processing
attention mechanism
long short term memory lstm
main ai page
week 2 introduction

🔎 full text search for 'gated recurrent unit'