In some other times we sometimes also use logarithm of the product
to convert this into a sum of log of probabilities.
This can help preserve precision as well, but
in this case we cannot use algorithm to solve the problem.
Because there is a sum in the denominator, but
this kind of normalizes can be effective for solving this problem.
So it's a technique that's sometimes useful in other situations in other
situations as well.
Now let's look at the M-Step.
So from the E-Step we can see our estimate of which distribution is more likely to
have generated a document at d.
And you can see d1's more like got it from the first topic,
where is d2 is more like from second topic, etc.
Now, let's think about what we need to compute in M-step well
basically we need to re-estimate all the parameters.
First, look at p of theta 1 and p of theta 2.
How do we estimate that?
Intuitively you can just pool together these z, the probabilities from E-step.
So if all of these documents say, well they're more likely from theta 1,
then we intuitively would give a higher probability to theta 1.
In this case, we can just take an average of these
probabilities that you see here and we've obtain a 0.6 for theta 1.
So 01 is more likely and then theta 2.
So you can see probability of 02 would be natural in 0.4.
What about these word of probabilities?
Well we do the same, and intuition is the same.
So we're going to see,
in order to estimate the probabilities of words in theta 1,
we're going to look at which documents have been generated from theta 1.
And we're going to pull together the words in those documents and normalize them.
So this is basically what I just said.