In some other times we sometimes also use logarithm of the product

to convert this into a sum of log of probabilities.

This can help preserve precision as well, but

in this case we cannot use algorithm to solve the problem.

Because there is a sum in the denominator, but

this kind of normalizes can be effective for solving this problem.

So it's a technique that's sometimes useful in other situations in other

situations as well.

Now let's look at the M-Step.

So from the E-Step we can see our estimate of which distribution is more likely to

have generated a document at d.

And you can see d1's more like got it from the first topic,

where is d2 is more like from second topic, etc.

Now, let's think about what we need to compute in M-step well

basically we need to re-estimate all the parameters.

First, look at p of theta 1 and p of theta 2.

How do we estimate that?

Intuitively you can just pool together these z, the probabilities from E-step.

So if all of these documents say, well they're more likely from theta 1,

then we intuitively would give a higher probability to theta 1.

In this case, we can just take an average of these

probabilities that you see here and we've obtain a 0.6 for theta 1.

So 01 is more likely and then theta 2.

So you can see probability of 02 would be natural in 0.4.

What about these word of probabilities?

Well we do the same, and intuition is the same.

So we're going to see,

in order to estimate the probabilities of words in theta 1,

we're going to look at which documents have been generated from theta 1.

And we're going to pull together the words in those documents and normalize them.

So this is basically what I just said.