In A Similar Vein Gradients Can Become Exponentially Big Leading To Exploding Gradients
in a similar vein gradients can become exponentially big leading to exploding gradients
Gated Recurrent Unit (GRUs) are a variant of recurrent neural networks that address some of the challenges of vanilla implementations of RNN such as gradient vanishing and explosion. The issue with vanilla RNNs is that because the same set of weights are multiplied applied the various timesteps, the value can quickly go to zero during the backpropagation step as the error signal is sent through the network. In a similar vein, gradients can become exponentially big leading to exploding gradients. However, the problem of exploding gradients can easily be handled by gradient clipping whereby if the value of a gradient is larger than a predetermined threshold, its value is set to that threshold.
GRUs tackle the problem of vanishing gradients by introducing a gate that determines whether internal state should be remembered or forgotten. The equation for a gated recurrent unit that determines the internal state h is presented below:
Where zt represents the update gate and is given as:
While rt represents the reset gate and is represented mathematically as:
Notice how the value of the update gate zt determines which portions of the equation contribute to the value of ht. A value of 0 for zt would knock out the first part of the equation of ht while a value of 1 would erase the second portion. This is analogous to how humans remember information they deem important while throwing away irrelevant pieces of information.