Now that you've processed your text corpus, it's time to populate the transition matrix, which holds the probabilities of going from one state to another in your Markov model. Begin by filling the first column of your matrix with the counts of the associated tags. Remember, the rows in the matrix represent the current states, and the columns represent the next states. The values represents the transition probabilities of going from the current state to the next state. The states for this use case is the parts of speech tags. As you can see, the defined tags and the elements in the corpus are marked with corresponding colors. For the first column, you will count the occurrences of the following tack combinations. As you see here, a noun following a start token occurs once in our corpus. A noun following a noun doesn't occur at all. A noun following a verb doesn't occur either. A noun following and other tag O occurs six times. The rest of the matrix is populated accordingly, but I'll take a little shortcut. The corpus as you can see is a verbalize high cool. Because there are no tag combinations with the tag VB, they are all zero. Unfortunately, there is no such shortcuts in the programming assignments. You've been warned. The zero tag, the O tag following a starts token occurs twice. The O tag following a noun tag, NN occurs six times. The last entry in the transition matrix of an O tag following an O tag has a count of eight. In the last line, you have to take into account the tagged words on a a wet wet, and, black to calculate the correct count. Now because you have calculated the counts of all tag combinations in the matrix, you can calculate the transition probabilities. So far, you've calculated and entered the counts in the matrix, which corresponds to the numerator in our formula. Now, you just have to divide each counts by the sum, which is actually the corresponding row sum. Remember what this row sum represents. For the row where the current state is a noun parts of speech, the sum across that row represents all pairs of words where the current state is a noun, and the next state is any parts of speech, whether it's a noun, a verb, or other. So for the transition probability of a noun tag NN following a start token, or in other words, the initial probability of a NN tag, we divide 1 by 3, or for the transition probability of another tag followed by a noun tag, we divide 6 by 14. You may have realized that there are two problems here. One is that the row sum of the VB tag is zero, which would lead to a division by zero using this formula. The other is that a lot of entries in the transition matrix are zero, meaning that these transitions will have probability zero. This won't work if you want the model to generalize to other equals, which might actually contain verbs. To handle this, change your formula slightly by adding a small value epsilon to each of the accounts in the numerator, and add N times epsilon to the divisor such that the row sum still adds up to one. This operation is also referred to as smoothing, which you might remember from previous lessons. So if you substitute the epsilon with a small value, say 0.001, then you will get the following transition matrix. The values shown here are shown up to the third decimal digits. So don't worry if the row sums don't add up to one exactly. The results of smoothing is, as you can see, that you no longer have any zero value with entries in A. Further, since the transition probabilities from the VB states are actually one-third for all outgoing transitions, they are equally likely. That's reasonable since you didn't have any data to estimate these transition probabilities. One more thing before you go. In the real-world example, you might not want to apply smoothing to the initial probabilities in the first row of the transition matrix. That's because if you apply smoothing to that row by adding a small value to possibly zeroed valued entries, you'll effectively allow a sentence to start with any parts of speech tag, including punctuation.