0:00

[MUSIC]

Â Let's see how the Metropolis-Hastings can work in this simple, one-dimensional case.

Â Again, it's not that useful in one dimension, but

Â it's chosen just for illustrational purposes.

Â So let's say we want to sample from this two model distribution of the blue curve.

Â And let's say we start with the orange point, which is at the position 0 and

Â it's at the iteration 0, so it's our initialization.

Â Let's see how the interpretation works here.

Â First of all, let's choose the proposal distribution Q.

Â And to make everything simpler,

Â let's just use standard normal centered around the current point.

Â So at each duration, the mark of chain Q will propose to move

Â somewhere around the current point with some small variance 1.

Â And for example the first duration,

Â the Markov chain may propose this point, which is the right one.

Â So now we have to compute the acceptance probability, so

Â whether we want to accept this point or not.

Â This the definition of the acceptance probability, we have just proved

Â in the previous video, but note that in this case the ratio we have.

Â So the Q(x' -> x), and the Q(x' -> x), just the same.

Â So it is the property of our current proposal illustration Q,

Â that it doesn't depend on the order of arguments.

Â Which means that we can just vanish this Q, and what is left is just

Â the ratio of the densities at the new point, and at the previous one.

Â Which kind of makes sense, it just says that if the new point is more

Â probable than the previous one, we'll definitely accept it with probability 1.

Â If it's not the case, then we'll think like may be we'll accept it, maybe we'll

Â may be not depending on how less probable the new point is than the previous one.

Â So in this particular case, for this particular red dot proposal,

Â the new point is much more probable than the previous one.

Â It's like almost four times more probable which means that the probability

Â of acceptance here is 1 and we'll definitely keep this point.

Â So the end of situation 1 is moving to this new point.

Â Okay what about iteration two?

Â Let's sample again a point from the proposal and

Â we'll be somewhere around here for example.

Â Again computing the acceptance probability,

Â here we moved a little bit to the hard dense division.

Â It's not that much higher, but since it's higher then we will definitely accept

Â this point, so with probability 1 we're keeping at this point, okay.

Â New proposal, this one is really, it's trying to move

Â to a really non probable region according to David Lugov, so

Â we'll keep this point with probability point 13,

Â and now we're flipping a biased coin with probability for

Â 2 in which tells us to accept this point and

Â with the probability 87 to reject this point.

Â So we flipped our biased coin, and

Â it told us to in this particular example, to reject this point.

Â Okay, why not?

Â Another proposal, it asks us to move here.

Â And this thing is much more probable than it appears, 1.

Â And we will keep this point with probability 73.

Â So most definitely keep this 1.

Â Again, we are flipping a biased coin, and now it tells us again to reject the point.

Â Well, why not, it happens sometimes.

Â It was more probable to accept it but we happen to reject it.

Â So again we are staying at the same place we were at the previous iteration.

Â And finally if you repeat this process for long enough, you will get a log like this.

Â So you will move In your sample space and here we have like 50 iterations and

Â sometimes you will stop it the same place and this plot will become flat.

Â But generally you will move around and

Â you can plot a histogram of the journey at points and

Â it looks kind of close to the Gaussian you want to sample from.

Â So in this case we didn't get exactly the samples we wanted,

Â the histogram is not exactly like the Gaussian.

Â I'm sorry not the Gaussian, but the blue curve, but it's close, so

Â it's a reasonable way to sample from the blue curve in this particular case.

Â Now the question is what happens if we change our proposal.

Â So let's say we use a Gaussian with less variance.

Â So we propose always to move just

Â with tiny little steps around the previous point.

Â Well, it kind of works, but in [INAUDIBLE] situations it doesn't

Â proceed to move outside the low density region where it started.

Â So in 50 iterations, it haven't converged yet, and

Â it will definitely converge at some point but we don't know where.

Â And this means that it's much less efficient choice here to use

Â these small steps.

Â What about large steps?

Â Well, if we increase the variance of our proposal distribution Q to be 100,

Â then we'll get large steps which is nice in terms of convergence and

Â uncorrelated samples.

Â But it will stay at the same place for really long periods.

Â Because it will be often that we state the nice place at the high probable place and

Â the mark of chain Q proposes us to move far away from it and we we don't like it.

Â So we stay where we were.

Â And this means that our actual symbols will be kind of correlated,

Â because we always stay at the same place and we waste the resources and

Â capacity of our computers.

Â I want to share with you one more thing about the Metropolis Hastings approach,

Â it's a really cool perspective on it which tells us that

Â Metropolis Hastings can be considered as a correction scheme.

Â So if you have a slightly wrong version of your assembly color theme,

Â you can correct it with Metropolis Hastings.

Â Let's look at one example, so recall the Gibbs scheme, the Gibbs sampling.

Â We used to assemble points at one dimension at a time and

Â the Gibbs scheme is inherently not parallel, so we'll have to

Â know all the information from the previous sub steps to make the next sub step.

Â Okay, let's try to make it parallel [COUGH]

Â let's briefly give sampling scheme and

Â use the information from the previous situations.

Â So the sub-steps will not depend on each other.

Â Briefly, If we have million-dimensional space, and if we have million computers,

Â we can do every sub-steps in parallel, and we'll be really, really fast.

Â But the problem is that we broke our scheme, we no longer have any conversion

Â guarantees, that it will convert to the true distribution which we want.

Â And it will not, actually.

Â So it will generate samples from some wrong distribution.

Â What can we do here?

Â Well, we can use this thing as a proposal distribution for the Metropolis Hastings

Â and then correct it with the from the Metropolis Hastings approach.

Â And since this particular proposal distribution is not arbitrary,

Â it's already almost right.

Â Almost generating your points from your desired distribution.

Â Then you will not reject too many points.

Â Because this is already almost the thing that you want to do.

Â So, you will just occasionally reject one point or another.

Â But generally it will, it may be much much more efficient overall,

Â because now you can do give subject a parallel.

Â So to summarize.

Â Metropolis Hastings approach is rejection sampling idea applied to Markov Chains.

Â And the nice thing about this approach is that it gives you a whole

Â family of Markov Chains and you can choose the one that you like.

Â Another bright property of this algorithm is that it works for

Â normalized probabilities and densities, as well as the Gibbs sampling,

Â and then it's also kind of easy to implement.

Â It is a bit harder than the Gibbs sampling maybe, but

Â anyways like five lines of code.

Â And the negative features are that samples are less correlated, but

Â still somewhat correlated.

Â And also, the property that it gives you a whole family of

Â Markov Chains is kind of a two-sided thing.

Â So first of all, it gives you flexibility to choose the right thing for

Â your particular problem.

Â But second of all it forces you to think, it forces you to choose something.

Â And may be hard in some cases, and harder to automate.

Â So if something works always, you don't always have to think at any point.

Â You can just automate the whole process.

Â And to apply in you usually have to think a little bit and

Â understand which proposal will make sense for your particular obligation.

Â [MUSIC]

Â