Introduction to RNN
RNN stands for Recurrent Neural Networks. The RNNs first appeared in the 1980s and since then they have been improved by several researchers around the world. In traditional neural networks, all the inputs and outputs are independent of each other. But, unlike the traditional neural networks, in RNNs, the output of the previous step is fed to the current step thus adding a memory component to it.
Hence, when presented with a sequence of data, RNNs examines the elements of the sequence one by one and retain the memory about the sequence so that this memory can be used as an input while examining the next element of the sequence.
Representation of a single RNN unit
To understand it’s functionality better, let’s try to compare this with the way our brain works while we read a text, as simple as a sentence or as huge as a book/document.
While reading a text, our brain actually reads it word by word. But then, our brain is able to understand the meaning conveyed by the text since our brain remembers the previous words while processing the next set of words in the text.
This is exactly how RNNs can also associate a sequence of inputs to an output.
Applications of RNN
RNN models are widely used in NLP and speech recognition. Speech recognition is the process in which speech wave data is taken as input and converted to a machine learnable format for identifying the words and phrases used in them. A simple way to imagine speech recognition would be to input audio clips and output the transcript. Speech is broken down into chunks and is passed through the RNN for word/phrase identification. Systems like Alexa, Siri use RNN for speech recognition
RNN models can be used for recognizing named entities present in a text document as well.
They are very useful in sentiment classification as well.
RNN’s are used in machine translation. Machine Translation is the process in which a system translates text/speech from one language to another.
Limitation of RNN
The problem with the Simple RNN model is the gradient of the loss function which is called the Vanishing Gradient Problem. Gradients carry information used in the RNN and when the gradient becomes too small, updates of parameters become insignificant. So, the learning of long data sequences becomes difficult using simple RNN.
RNN models are quite powerful and efficient in handling sequential data. RNNs face some challenges while dealing with long term dependencies in the input data. RNN suffers from a vanishing gradient problem which stops the learning for long data sequences.
As we know, during the backpropagation, the weights are adjusted using the loss and optimization algorithm. In general, gradient descent can be considered as the base optimization algorithm. For updating the weights, it follows the below update rule,
w = w-w ; b = b-b
Where w=L(w)wand b=L(b)w
therefore, In recurrent neural networks, updates of the weight are proportional to the partial derivative of the error function with respect to the current weight in each iteration. so, The problem is that the gradient will be very small and prevent the weight from changing its value. When the gradient becomes smaller and smaller, then the weight updates become insignificant. This means that the neural network stops learning.
Let assume for some time step k,
w0
The weights update will be,
w = w-w w
As we can see that the complete error gradient will vanish and there is no significant learning.
It will be difficult for RNN to predict the word due to long term dependencies. So, we can overcome this problem with the help of the LSTM network by remembering information for a longer period of time.
Introduction to LSTM
Although RNNs enable the modelling of time-dependent sequential data tasks like stock market prediction, machine translation, etc., suffer from the problem of vanishing gradients, which impedes its learning when long data sequences are given as input. The gradients are the ones that carry information about the updated RNN parameter values during the learning phase of the RNN model.
Hence, when the gradient becomes smaller and smaller, the process of updating the parameter values becomes insignificant which means that no real learning is done. This is a major drawback of the RNN model.
Long Short Term Memory networks (LSTMs) are also a special kind of RNN, capable of learning long-term dependencies.LSTMs are explicitly designed to overcome the vanishing gradient problem.
All recurrent neural networks have the form of a chain of repeating modules of neural networks. In standard RNNs, this repeating module will have a very simple structure whereas an LSTM has a different structure with 3 gates that perform specific functions.
LSTM gates:
- The forget gate – decides the information to be erased from the LSTM cell
- The input gate – decides the information is going to be stored within the LSTM cell
- The output gate – decides the information to be presented at the output
LSTM Representation
In the LSTM cell, the vanishing gradient problem is solved by writing the current state as into a cell of the network. This writing process is regulated by the input and forgets gates. Just like how we humans write down something if we want to remember, LSTMs also store the information that it wants to remember. However, writing down every step is tedious and might blow up the memory if the “writing down” process is not regulated.
Hence, to tackle this problem, LSTMs operate in a selective fashion in the below activities.
- Write: During the learning phase, the network figures out the important features and thus stores them.
- Read: Only the features that are relevant to the current state are read by the LSTM cell.
- Forget: The parameters that are not necessary are erased from the LSTM cell.
Also, bidirectional LSTMs, or in other words, bidirectional RNNs is a neural network model in which two independent RNNs are put together. thus, This structure allows the networks to have both past and future information about the sequence at every time step.
In bidirectional LSTMs, the inputs are however run in two ways, one from the past to the future and another one from the future to the past.
Comparison of RNN and LSTM
RNNs | LSTM |
RNNs are a type of neural network in which loops are present which allows the information to be stored within the network. | LSTMs are a variant of RNN. They can capture long-term temporal dependencies overcoming the optimization hurdles that were present in simple recurrent networks (SRNs). |
RNNs have hidden states that serve as the memory for RNNs. There is no concept of cell state here. | Whereas, LSTM has both cell states and hidden states. The cell state removes or adds information to the cell, and it is regulated by the gates. Due to this cell state, LSTM captures the long-term temporal dependency. |
RNNs can be used for Speech recognition, machine translation, and language modelling. | LSTMs can be handwriting recognition, language modelling, language translation, acoustic modelling of speech, analysis of audio, and video data. |
written by: Ganesh Hari
reviewed by: Savya Sachi
If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs