[MUSIC] There are two terms that factor into this probability calculation. The first is how much this document likes this topic. So previously, we used pi i, the document specific topic proportions to inform us of this. But now, when we don't have these topic proportions sampled, we look at every other word in the document and see how often the topics were used. And that's exactly what this equation shows here. In particular, we have the total counts of words in document i to topic k and then we're going to add alpha, which is a smoothing parameter coming from our Bayesian prior. And then at the denominator, we have the total number of words in the i document, and then the reason we have minus one is because we're ignoring the current word. When we're forming this count NIK, it does not count the word that we've removed the assignment to, so when we think about normalizing, we normalize by ni minus one. And then we have k different topics that had that alpha regularization parameter added to it, so we have k times alpha in the denominator. So what we see in this plot here is that this document really likes topic one, and topic three, because we have two counts of topic one and topic three, and doesn't like topic two all that much. There are actually no counts of topic two, but the only reason there's positive weight on topic two comes from our Bayesian prior. Okay, well this term is then weighted by another term, which has to do with how much each topic likes the observed word. So we look at the word dynamic, and we look throughout the entire corpus now, and we say how many times was that word used by a given topic or assigned to a given topic. And in equations, it has the following form, where corpus-wide, we look at the counts of the assignments of word dynamic to topic k. And then we have this smoothing parameter again from our Bayesian prior. And then we just normalize this. And what's the normalization that we use here? Well, we're fixing a given topic and we're just normalizing by how much it likes any word. So we're summing over all possible words in the vocabulary and looking at how many times those words were assigned to this topic, and then this is the normalization from the prior, where V is the size of the vocabulary. So, just to be clear here, when we don't have the topic specific vocabulary distributions that tell us the probabilities of every word in the vocabulary, we're going to use the word assignments in the corpus as a surrogate. And just to interpret this plot here, what we see is that topic one, really likes this word dynamic. That's a topic about things that were more technology-based. But we see that topic two also likes the word dynamic, but in some different context. It's some other topic, so maybe it's a topic about fluid dynamics. And so the word dynamics is used very frequently in that topic, whereas topic three doesn't like the word dynamic very much at all. So what we see in these pictures is that topic one fits both the word dynamic, as well as the document. So the document likes topic one, and topic one likes the word dynamic. In topic two, we see that the topic fits the word, so topic two likes the word dynamic, but the document doesn't really like topic two, because it's probably not something about fluid dynamics. And finally, topic three doesn't like the word dynamic, but the document really like this topic, so this might be some other topic present in this document, that is unrelated to the word dynamic. So when we go to form our probability of an assignment of the word dynamic to any of our topics, we simply multiply these two probabilities that we've talked about. So we just look at the area of these squares, and this is going to represent the probability of choosing topic one versus topic two versus topic three. So to draw our assignment, we can think of two different analogies here. One is, we can think of rolling a k-sided die, where the probability of every side on that die is given by the relative proportion of these areas, or equivalently, we can think of just throwing a dart. So think of stacking up all the green region together, throwing a dart, and let's say there's a uniform chance that we hit any part of the green space. Well, we're most likely to his this green space, which would be an assignment to topic one. There's some probability of hitting topic two or topic three, but it's less likely. So in this case, let's say we happen to sample a value of one. So, an assignment of the word dynamic to topic one. So we see that here. This is our new assignment of this word in this document. And now what we have to do is we have to go and we have to increment our counts. So we go to our local counts. And we say, okay, we now need to increase the counts of topic one. So instead of two counts, we have three counts. And then, again, we go to our corpus-wide statistics, look at dynamic, the word dynamic, and now we're going to increment the counts of the word dynamic being associated with topic one. So this 10 becomes an 11. So in pictures, what we've done is we've simply increased how much topic one likes the word dynamic, and how much document i likes topic one. Okay, well, now that we've reassigned this word in this document, we can move on to the next word and go through exactly the same process that we did where we removed that count, decrement, or remove it out of assignment and then decrement the counts, the associated counts in the local and global tables. And then we go and we form the probability of an assignment of this word in this document using those two terms of how much the document likes a given topic and how much the topic likes this word, Bayesian, now, in this case, now we go and we re-sample this value. And then we move to the next word, and the next word, and we cycle all the way through the words in this document, and we move to the next document, and we keep going. And once we've gotten through all the words in the corpus, we continue. We go again and again and again and again, doing multiple passes, many, many passes through the words in the corpus, reassigning their topic indicators, and we do this again until we run out of our computational budget. [MUSIC]