In this lecture we'll focus on deep learning using RNN, Recurrent Neural Networks. RNN applications include, speech recognition. A representative example, would be on these smartphone technologies. Like in Apple's Siri, Google's Voice Search and Sumsung's S Voice. Handwriting recognition, sequence data analysis, program code generation, where an RNN automatically generates computer programming code that can serve a predefined functional objective. An RNN structure uses a neural network with a directed cyclic connections between neurons. Directed cyclic connections create internal states with dynamic temporal characteristics. And internal memory is used to process arbitrary input data sequences. Sequence modeling is used for data sequence classification and clustering. A sequence modeling structure based on sequence to sequence learning is shown down there, where N inputs are transformed into M outputs. And you can look at the example down there where 6 inputs are at the input. And then at the output, we have 4 outputs. The RNN process is explained right here. Starting from step 1, where data input to the Input Layer. Then number step 2, representation of the data in the Input Layer is computed and sent to the Hidden Layer. Step 3, Hidden Layer conducts Sequence Modeling and Training in Forward or Backward directions. Then, step 4, Multiple Hidden Layers using Forward or Backward direction Sequence Modeling and Training can be used. Then, a final Hidden Layer sends the processed result to the Output Layer. To give you a visual example of Forward RNN and backward RNN, well we'll use this example. First we'll do the Forward RNN. Now, look at this neural structure that you see right here. Then, the Input Layer and the Output Layer, are these layers, and in the middle is the RNN layer, which is the Hidden Layer. Then, we have, Related Memory is used from the hidden layer connected to the output layer. For the data input in the input layer, sequence direction as you see by the arrows. And then give more data and more data coming in. Well, we can set the symbols from X1 to X4 of the input data sequence like that down there. Then, the RNN layer, working in the direction as you see those arrows are in the same direction as the data input in the input layer. And that's why we call it a Forward RNN. And this is the part that Sequence Modelling is actually conducted. In a Backward RNN, you can see that the hidden layer operates in the sequence direction that is opposite to the forward RNN. And that's where the name Backward RNN is used. In Sequence to Sequence, S2S Deep Learning RNN, assuming an input data sequence of Data 1, Data 2, Data 3, applied to the RNN, a hidden layer may be in the form of the Forward RNN or a Backward RNN. For example, when you have multiple hidden layers, and once again we will have many hidden layers because this is a deep Neural Network. And well, the hidden layers that you see here can be either forward or backwardly set based upon how we want the performance to work. In RNN processes an important technology is representation. This is applied to the input layer data. It's similar to subsampling, which was pooling process used in CNN. Representation is used to extract the important data that characterizes the data set in the best way. Representation is a non-linear down-sampling process. Here are some examples of representation. First, we start with center, where the center value is selected. For example, over there among those 9 numbers 5 is really in the middle. So therefore it is the center value. Then the median, here after lining up the data sequence from the largest to the smallest the middle value is selected. For example, the middle value 7 was selected because if you line up the numbers from 13, 11, 10, 9, 7, 5, 3, 2, 1, then 7 is in the middle, so the median value is 7. Then, the average value. Well, among these numbers the average is 6.7, because if you add all these numbers up, and divide them by 9, you get the average of 6.7. Max pooling, where the max value is used. Among those 9 numbers over there, well the largest one is 13. Based on this sequence numbering you can see that 13 is definitely the largest number. And therefore Max Pooling will use that as the representation value. Then there is the weighted sum. The data values d1 are there, d1, d2, d3 and on. And then we have the weights which are w1, w2, all the way to w9. Well we have nine data values and nine weights. Well we can create a weighted sum, v, which correspondingly maps each weight to each data value, and then adding them all up. Now this is like an element Y's multiplication of the weight vector and the data vector. Then you can use this operation to compute what we have as a weighted sum value, v. Then there is context based projection. This uses assisting data in the representation process, and the assisting data is called the context. Context data may be the original data input, or it maybe a biased or transformed data. For example, here we have the data set and we have the representation, multiplied to weight, and then it goes up like that. And that is used in the RNN sub-layer to process and result in another value. Where the context data there is there, and then there's a weight of the context, which is multiplied by Wc. And that is going to influence this, the representation value multiplied to the weight, in the process inside the hidden layer. Context based Projection using a larger structure. As you can see here Data 1, Data 2, Data 3, we have three sets. We have representation values with a W. Then there is the context weight influencing context. And a context weight, resulting in the hidden layer operation that results in this value right here, where the current input Data 3, and the past information are used in this process. Attention enables the decoder to attend to different parts of the source data segment at various steps of the output generation. A good example can be found in Language Translation, where an RNN attends to sequential input states, multiple words simultaneously, and words in different orders when producing the output translated language. Looking into the process of representation with attention, the SoftMax transfer transfers the value of a1, a2, and a3 into tilde a1, tilde a2, and tilde a3 correspondingly. Where the tilde a1 + tilde a2 + tilde a3 values should add up to 1. Then attention values, A1, A2, and A3 represent the importance of the data set. And as you can see here with the input x1, x2, and x3. Then there is the attention unit which has the a1, a2, a3. And those are used in that type of a structure. The SoftMax values, tilde a1, tilde a2 and tilde a3 with weights w1, w2, w3, and attention values, A1, A2, A3 are used to transform the values. Here is the SoftMax values computed resulting in tilde a1, tilde a2, tilde a3. And once again they add up to 1. Then they are each correspondingly multiplied to the weights w1, w2, w3 and this is used to transform each data set. For example we have 1, 2, 3 over there, which are the original data sets. Then there are the attentions a1, a2, and a3. And in the middle, you can see the sky blue circle, which is a SoftMax operation. And then from the SoftMax we'll get the SoftMax output, which tilde A1, tilde A2, and tilde A3, which are multiplied to the weights w1, w2 and w3 like that. The element-wise multiplication is used over in that stage coming out that results in these 1, 2, 3 data sets that are now transformed. RNN types include, Fully Recurrent Neural Network, FRNN. This is an early model, where all neurons have connections to other neurons with modifiable weights in FRNN. Neurons form input, hidden, and output layers. LSTM, Long Short-Term Memory RNN. This is currently the most popular RNN model. LSTM cells, well, this has an Input Gate, Output Gate, Forget Gate, Self-Loop and they may have many more gates. Self-Loop in LSTM Cells give a data sequence memory effect. Here is one example, and there are many other models. But this is just one example model where you can see the input over there as x. And then there is the input gate, the forget gate, the output gate, and also you can see in red over here is the self-loop. Now, a RNN recurrent gate, which is the Forget gate, has these type of effects. It prevents backpropagated errors from vanishing or exploding. Now, the vanishing is the vanishing gradient problem, it prevents this. The exploding is the divergence problem and this is also prevented. LSTM recurrent gates, which are the forget gates, enable errors to flow backwards through an unlimited number of virtual layers, extending the memory characteristics. LSTM is effective on data sequences that require memory of far past events. Such as thousands of discrete time steps ago. LSTM RNNs perform well on data sequences with long delays, and mixed signals with high and low frequency components. RNN in applications that use LSTM technology include speech recognition and large-vocabulary speech recognition, pattern recognition, connected handwriting recognition, text-to-speech synthesis, recognition of context sensitive languages, machine translation, language modeling, multilingual language processing, and automatic image captioning. This uses LSTM RNN and CNN technology together. These are the references that I used, and I recommend them to you. Thank you.