In the last video, you saw some of the wide range of applications through which you can apply sequence models. Let's start by defining a notation that we'll use to build up these sequence models. As a motivating example, let's say you want to build a sequence model to input a sentence like this, Harry Potter and Hermione Granger invented a new spell. And these are characters by the way, from the Harry Potter sequence of novels by J. K. Rowling. And let say you want a sequence model to automatically tell you where are the peoples names in this sentence. So, this is a problem called Named-entity recognition and this is used by search engines for example, to index all of say the last 24 hours news of all the people mentioned in the news articles so that they can index them appropriately. And name into the recognition systems can be used to find people's names, companies names, times, locations, countries names, currency names, and so on in different types of text. Now, given this input x let's say that you want a model to operate y that has one outputs per input word and the target output the design y tells you for each of the input words is that part of a person's name. And technically this maybe isn't the best output representation, there are some more sophisticated output representations that tells you not just is a word part of a person's name, but tells you where are the start and ends of people's names their sentence, you want to know Harry Potter starts here, and ends here, starts here, and ends here. But for this motivating example, I'm just going to stick with this simpler output representation. Now, the input is the sequence of nine words. So, eventually we're going to have nine sets of features to represent these nine words, and index into the positions and sequence, I'm going to use X and then superscript angle brackets 1, 2, 3 and so on up to X angle brackets nine to index into the different positions. I'm going to use X_t with the index t to index into positions, in the middle of the sequence. And t implies that these are temporal sequences although whether the sequences are temporal one or not, I'm going to use the index t to index into the positions in the sequence. And similarly for the outputs, we're going to refer to these outputs as y and go back at 1, 2, 3 and so on up to y nine. Let's also used T sub of x to denote the length of the input sequence, so in this case there are nine words. So T_x is equal to 9 and we used T_y to denote the length of the output sequence. In this example T_x is equal to T_y but you saw on the last video T_x and T_y can be different. So, you will remember that in the notation we've been using, we've been writing X round brackets i to denote the i training example. So, to refer to the TIF element or the TIF element in the sequence of training example i will use this notation and if Tx is the length of a sequence then different examples in your training set can have different lengths. And so Tx_i would be the input sequence length for training example i, and similarly y i t means the TIF element in the output sequence of the i for an example and Ty_i will be the length of the output sequence in the i training example. So into this example, Tx_i is equal to 9 would be the highly different training example with a sentence of 15 words and Tx_i will be close to 15 for that different training example. Now, that we're starting to work in NLP or Natural Language Processing. Now, this is our first serious foray into NLP or Natural Language Processing. And one of the things we need to decide is, how to represent individual words in the sequence. So, how do you represent a word like Harry, and why should x_1 really be? Let's next talk about how we would represent individual words in a sentence. So, to represent a word in the sentence the first thing you do is come up with a Vocabulary. Sometimes also called a Dictionary and that means making a list of the words that you will use in your representations. So the first word in the vocabulary is a, that will be the first word in the dictionary. The second word is Aaron and then a little bit further down is the word and, and then eventually you get to the words Harry then eventually the word Potter, and then all the way down to maybe the last word in dictionary is Zulu. And so, a will be word one, Aaron is word two, and in my dictionary the word and appears in positional index 367. Harry appears in position 4075, Potter in position 6830, and Zulu is the last word to the dictionary is maybe word 10,000. So in this example, I'm going to use a dictionary with size 10,000 words. This is quite small by modern NLP applications. For commercial applications, for visual size commercial applications, dictionary sizes of 30 to 50,000 are more common and 100,000 is not uncommon. And then some of the large Internet companies will use dictionary sizes that are maybe a million words or even bigger than that. But you see a lot of commercial applications used dictionary sizes of maybe 30,000 or maybe 50,000 words. But I'm going to use 10,000 for illustration since it's a nice round number. So, if you have chosen a dictionary of 10,000 words and one way to build this dictionary will be be to look through your training sets and find the top 10,000 occurring words, also look through some of the online dictionaries that tells you what are the most common 10,000 words in the English Language saved. What you can do is then use one hot representations to represent each of these words. For example, x-1 which represents the word Harry would be a vector with all zeros except for a 1 in position 4075 because that was the position of Harry in the dictionary. And then x_2 will be again similarly a vector of all zeros except for a 1 in position 6830 and then zeros everywhere else. The word and was represented as position 367 so x_3 would be a vector with zeros of 1 in position 367 and then zeros everywhere else. And each of these would be a 10,000 dimensional vector if your vocabulary has 10,000 words. And this one A, I guess because A is the first whether the dictionary, then x_7 which corresponds to word a, that would be the vector 1. This is the first element of the dictionary and then zero everywhere else. So in this representation, x_t for each of the values of t in a sentence will be a one-hot vector, one-hot because there's exactly one one is on and zero everywhere else and you will have nine of them to represent the nine words in this sentence. And the goal is given this representation for X to learn a mapping using a sequence model to then target output y, I will do this as a supervised learning problem, I'm sure given the table data with both x and y. Then just one last detail, which we'll talk more about in a later video is, what if you encounter a word that is not in your vocabulary? Well the answer is, you create a new token or a new fake word called Unknown Word which under note as follows and go back as UNK to represent words not in your vocabulary, we'll come more to talk more about this later. So, to summarize in this video, we described a notation for describing your training set for both x and y for sequence data. In the next video let's start to describe a Recurrent Neural Networks for learning the mapping from X to Y.