Welcome back. In the last session we have defined the architecture of our LSTM network. So short reminder, we have defined the input layer and we have defined two LSTM layers and output dense layer. This is the at first sight, a pretty simple architecture. So we have four layers which are connected with each other, as we know from another neuronal networks. So they are connected as dense layers from the input to the output. But now, let us have a look at LSTM layers. So, to reduce the complexity, let us assume that our layer will consist only of one node. Just for exponential results. Now we're looking at one layer, which has one node. So let's have a look. This is the picture which I have copied from the homepage of the Redwood Center for Theoretical Neuroscience. This is the center of Berkeley University, and this is the blog of Brian Cheung. Now, what do we have here? You see here a timeline. So here, you see seven timestamps. In our network, we have 10 timestamps. But you can imagine, so we can add here three additional timestamps. Now, we have here input layer denoted as inputs. We have here hidden layer, and we have output layer, outputs. This is the same architecture as we have here. We have input layer, we have LSTM layers but now we think only of one layer, and we have here output layer which is the dense layer. This is the same architecture. But what we see here is that the LSTM layer is unrolled in time. What does it mean unrolled or unfold in time? It means following. Those are all copies of the same LSTM node. Here we have seven copies. In our case, we would have ten copies. So the LSTM creates for every timestamp, a copy of itself. And this hidden layer which has only one node, in our case, we have reduce it, in fact to one node, but it can have much more nodes, let's say 10, 20, 40, whatever you like. But here, to simplify the whole story, we think of it as of one single node. So, for every timestamp, the LSTM creates as I said, a copy of itself. How data flows. So we see here that the data on the input node at a certain timestamp here, for example, one, two, three, four and so on, flows to the hidden layer. And here, you see these signs, this circle and this minus signs. These signs denotes to so-called gates or valves. In the next explanation, we will see what the valves are. But now, just think of them as valves. So here you see a cycle, open pipe, if you look at the first timestamp, you see that the data can flow through and arrives this black node in the hidden layer. Then you see a kind of minus sign and the data is blocked here, it cannot flow to the output. Now we're talking about the first timestamp. Let's have a look at the second timestamp. The information flows from the input to the hidden layer, but again, it's blocked here. If we look at the path from timestamp one to the timestamp two, we see that information can flow. So, this valve, the second timestamp is open, so the information flows there. Then it flows to the timestamp three, then it flows to the timestamp four. And only here the valve which allows to flow to the output again is open. So now have a look at such a single node at one timestamp. Anatomy of an LSTM node. So, we have seen this is the same picture as above. Now what we have here explains a lot. Also what we have talked about in in the previous sessions. You see here cell state and hidden state. This is very often confused by people, but here, I find this explanation is very clear. What we have here, we have here Cell State. Cell State is actually just inner memory. This is the state of the LSTM node memory, which is within this node. This Cell State or Cell Memory is surrounded by three gates. One is the Input Gate. Input Gate is the valve which controls the information flow from input to the Cell State. So, this is the valve or this is the gate which can let in the information to the Cell State or block the information, not allow to pass this valve. So you can see here, only the first timestamp the information can flow to the hidden layer. Now have the Forget Gate. This gate stands between different timestamps within Hidden Layer. So all those, if you look at the Hidden Layer here, you'll see the cycle to the left. This is the valve. This is the Forget Gate. So here, it's open at the first timestamp and the second timestamp is open again. And the third open, fourth, open, fifth, open, sixth, open, but seventh it's closed. So the information cannot flow from the sixth timestamp within Hidden Layer to the seventh timestamp. And now we have Output Gate or Output Valve. This valve controls the information flow to the outside, to the output. So you'll see here, Output Layer. For every timestamp, you produce one output here. And you see the information, the timestamp one, it blocks here, timestamp two, again block, timestamp three, blocked, timestamp four, open. The information can flow to the output. And this output is Hidden State. Why it's hidden? What was hidden in it? So actually. Okay. It refers to the output of the Hidden Layer, hidden meaning, this is the layer which lies underneath of the input. But there is more about this to be hidden. This information accumulates a lot of processes which are happening within time. So the whole flow of information from timestamp one to timestamp six is reflected in this output. So we have here one open gate, we see here output gate is open at timestamp four and at timestamp six. But what the network is outputing here, is the result of processes which are happening in the timestamps before that. And this is one of the reasons why it said hidden state. So, this is the short explanation. So just remember the main things here, that the Cell State is the memory state. This is the inner cell of the state. It is not always passed to the next layer. And we see here, since we have defined a state for LSTM. So we have said, stateful true, it return sequences, and it's also by default, returning state, Cell State. Why? We have said that the batches other training units. And after one batch is completed, the next batch is initialized. Not with zeros as the case was stateless neuronal networks. It's initialized with the Hidden State in the Cell State from the previous batch. And this is what we have to remember. Now, if you would like to know more or more in detail how it works, I mean, every single gate, how it's computed, and so on and so on, it's all described very well on Internet and there are several great articles out there but I would recommend that you should look at article of Colah. Colah's blog and it's a really great article and good reference. So with this, stay tuned and enjoy our sessions. See you. Bye bye.