By now you've seen most of the key building blocks of our RNN. But there are just two more ideas that let you build much more powerful models. One is bidirectional RNN, which lets you at the point in time to take information from both earlier and later in the sequence. So talk about that in this video. And the second is deep RNN, which you see in the next video. So that starts with bidirectional RNN. So to motivate bidirectional RNN, let's look at this network which you've seen a few times before in the context of named entity recognition. And one of the problems of this network is that, to figure out whether the third word Teddy is a part of a person's name, it's not enough to just look at the first part of the sentence. So to tell if y3 should be 01, you need more information than just the first few words. Because the first three words doesn't tell you if they're talking about Teddy bears, or talk about the former US President, Teddy Roosevelt. So this is a unidirectional or four directional only RNN. And this comment I just made is true whether these cells are standard RNN blocks, or whether there are GRU units, or whether they're LSTM blocks, right? But all these blocks are in a forward only direction What a bidirection RNN does or a BRNN is fix this issue A birectional RNN works as follows. I'm going to simplify to use a simplified four input sentence So we have four inputs, X1 through X4. So this network's hidden there will have a forward recurring components. So I'm going to call this a1, a2, a3, and a4. And I'm going to draw a right arrow over that to the notices the four recurrent component. And so w connected as follows. And so each of these four recurrent units influence the current X. And then on feeds in, To help predict y hat 1, y hat 2, y hat 3, and y hat 4. So far I haven't done anything, right? Basically, we draw on the RNN from the previous slide, but with the arrows place in slightly funny positions. But I drew the arrows in these slightly funny positions because what we're going to do is add a backward recurrent there. That would have a1, left arrow to denote this is a backward connection, and then a2 backwards, a3 backwards, and a4 backwards. So the left arrow denotes it is a backward connection. And so we're then going to connect the network up as follows. And these a backward connections will be connected to each other, going backwards in time. So notice that this network defines a cyclic draft. And so, given an input sequence X1 to X4, the four sequence we first compute a for (1), then use that to compute a for (2), then a for (3), then a for (4). Whereas the backward sequence will start by computing a backward four and then go back and compute a backward three. And notice your computing network activations. This is not background, this is for a problem but the for profit goals has partially but the forward problem has part of the complication going from left to right and positive competition going from right to left in this diagram. But I havent computed a backward three. You can then use those activations completely backward two, and then a backward one. And then finally, you haven't computed all your hidden their activations. You can then make your predictions. And so for example, to make the predictions your network would have something like y hat at time T is an activation function apply to WY with both the forward activation at time T. And the backward activation at time T being fed in, to make that prediction at time T. So if you look at the prediction at times set 3 for example, then information from X1 can flow through here for one to for 2, that also takes an information here, to for 3, so y had three. So information from X1, X2, X3 are all taking account with information from X4 can flow through a backward four to a backward three 2y. So this allows the production and tie three to take his input both information from the past, as well as information from the present, which goes into both the forward and backward things at this step, as well as information from the future. So in particular, given a phrase like he said Teddy Roosevelt...to predict whether Teddy was part of a person's name. You give to take into account information from the past and from the future. So this is the bidirectional recurrent neural network. And these blocks here can be not just the standard RNN block, but they can also be GRU blocks, or LSTM blocks. In fact, for a lot of NLP problems, for a lot of text or natural language processing problems, a bidirectional RNN with a LSTM appears to be commonly used. So, if you have an NLP problem, and you have a complete sentence, you're trying to label things in the sentence, a bidirectional RNN with LSTM blocks before then backward would be a pretty reasonable first thing to try. So that's it for the bidirectional RNN. And this is a modification they can make to the basic RNN architecture, or the GRU, or the LSTM. And by making this change, you can have a model that uses RNN, or GRU, LSTM, and is able to make predictions anywhere even in the middle of the sequence, but take into account information potentially from the entire sequence. The disadvantage of the bidirectional RNN is that, you do need the entire sequence of data before you can make predictions anywhere. So, for example, if you're building a speech recognition system then BRNN will let you take into account the entire speech other friends. But if you use this straightforward implementation, you need to wait for the person to stop talking to get the entire utterance before you can actually process it, and make a speech recognition prediction. So the real time speech recognition applications, there is somewhat more complex models as well rather than just using the standard by the rational RNN as you're seeing here. But for a lot of natural language processing applications where you can get the entire sentence all at the same time, the standard BRN and algorithm is actually very effective. So that's it for BRNN in the Nixon final video for this week. Let's talk about how to take all of these ideas RNN, LSTM, GRUz, and bidirectional versions, and construct deep versions of them.