So now we have re-introduced ourselves to this concept of mapping a word to a vector, we have gained some further insight into what this mapping it represents. So now let's move forward and actually start to model sequences of words or sentences or documents. So let's consider a situation where we have N words which are represented as W_1, W_2, through W_N. Those correspond to the N words, for example, in a sentence or a document. Based upon the understanding that we now have, every one of those words will be mapped to a d-dimensional vector or code. C_1 corresponds to the code associated with W_1, C_2 with the code associated with W_2, and C_N with the word W_N. One of the things that might bother you at this point is that this concept of mapping each word to a single vector to one vector is restrictive, in the sense that if you look at a word in the dictionary, it has multiple definitions and the definition that applies corresponds to the context in which the word is used. So this idea of mapping each word to a single vector, which implies a single type of meaning for that word, in the sense that we talked about earlier, seems restrictive because it does not take into account the context of the surrounding words. So what we would like to do now is develop a framework by which we can modify these word vectors in a way that takes into account the meaning implied by the surrounding words. So to do that, we need to introduce this concept of an inner product between two word vectors. This is a new concept that we haven't talked about very much thus far. We're going to try to introduce it in a relatively simple way. So again, we have an N words, W_1 through W_N. Those N words are mapped to N codes C_1, C_2, through C_N. Those mappings are true. In other words, the word W_1 is mapped to the code C_1 independent of all of the surrounding words. So in the sense that we talked about earlier, this is problematic. We're going to extend that. To do that, we need to introduce a little bit of notation. So again, C_1 corresponds to the code associated with word 1. It has little d components, and so in this figure, what we're meaning to represent is the d components of that word vector. C_1,1 is the first component of word of code 1. C_1,2 is the second component of code C_1, and C_1,d is the d_th component of that code. What we're going to do now is introduce the concept of an inner product between two codes. So let's consider codes C_1 and C_2 associated with word 1 and word 2 and we want to quantify the relationship between those two codes. The way that we do this is with something called an inner product, also called a dot product. The reason it's called the dot product is notationally, we see C_1. C_2. So this is an inner product or dot product and the figure that you see at the bottom is meant to depict what that equation is saying and hopefully the picture makes the equation meaningful. So now, what we're going to do is we have two codes, C_1, C_2. Each one of which has d components using the notation that we talked about before. What we're going to do is we're going to take the first component of code 1 and the first component of code 2 and we're going to multiply them together. We're going to take the second component of code 1, the second component of code 2 and we're going to multiply them together. The third component of code 1, third component of code 2, multiply them together. We do that for each of the d components. So what we do is we take code 1, code 2, we take each component one-by-one, d of those components, and we multiply them together and then we take each of those products, those d products, and we sum them together. That's symbol, that Sigma sign means summation. So this process of multiplication and then summing, which is represented by the equation in the middle, is called an inner product. For reasons that we're going to talk about subsequently. What the inner product is going to do is to quantify how similar code 1 is to code 2 through this inner product process, and of course, if we can do this for code 1 and code 2, we can do this for any two codes in our vocabulary. So if words W_1 and W_2 are similar, then we would expect that code C_1 and C_2 to also be similar. Because remember, in the sense that we talked about earlier, these codes, the components of those codes, represent the underlying meaning of the words. If word 1 and word 2 are similar, we would expect that the corresponding codes, C_1 and C_2 are also similar. In the sense that we're going to talk about in a moment. If code C_1 and C_2 are similar, then this dot product C_1. C_2 should be large and positive, in the sense that we'll talk about in a second. If code C_1 and C_2 are different, which means that the corresponding words, W_1 and W_2 are dissimilar, than we would expect that that inner product will be small or negative and we'll see this in a moment. But this idea of the inner product, which quantifies the similarity between words is fundamental to what is going to come subsequently and just to underscore this point, if words W_1 and W_2 are similar, the inner product C_1. C_2 will be positive and we'll see this in a moment. If two words are dissimilar, that inner product, C_1, C_2 will tend to be negative and this will become more clear as we proceed. This concept of inner product, and this concept of using the inner product to quantify similarity of words through their word vectors is fundamental to the neural processing that we're going to do subsequently for natural language processing. So this inner product is something that we're going to be using repeatedly in the discussion that follows. So in the highlighted block, that is the same expression that I showed before. It is a lot to carry around, and so what we're going to do is we're going to introduce a concise notation for the inner product. So the inner product between code C_1 and C_2 visually, henceforth, is going to be represented as you see it on the right. So this is going to be a representation of an inner product. Again, what this means is if I take code 1 and code 2, I take each of the d components of those codes and multiply them one by one and then we sum it together.