0:00

So welcome to part two of our probability review. This video assumes you've already

Â watched part one or at least are familiar with concepts covered in part one. Namely

Â sample spaces, events, random variables, expectation and linearity of expectation.

Â In this part of the review we're going to be covering just two topics. Conditional

Â probability and the closer related topic of independence. Both between events and

Â between random variables. I want to remind you that this is by no means the only

Â source you can or should use to learn this material. A couple of other sources free

Â that I recommend are lecture notes that you can find online by Eric. And also

Â there's a wiki book on discrete probability. So, conditional probability,

Â I hope you're not surprised to hear, is fundamental to understanding randomized

Â algorithms. That said, in the five weeks we have here, we'll probably only use it

Â once. And that's in analyzing the correctness of the random contraction

Â algorithm for computing the minimum cut of an undirected graph. So, just to make sure

Â we're all on the same page, here's some stuff you should have learned, from part

Â one of the probability review. You should know what a sample space is. This

Â represents all of the different outcomes of the random coin flips, all of the

Â different things that could happen. Often in randomized algorithm analysis, this is

Â just all of the possible random choices that the algorithm might make. Each

Â outcome has some known probability [inaudible]. By and, of course, the sum of

Â the probabilities equal one and remember that event is nothing more than a subset

Â of omega. Omega is everything that could possibly happen. S is some subset of

Â things that might happen and, of course, the probability of event is just the

Â probability of, of all the outcomes that the event contains. So, let's talk about

Â conditional probability. So one discusses the conditional probability of one event

Â given a second event. So, let X and Y denote two events, subsets of the same

Â sample space. You might want to think about these two events X and Y in terms of

Â an event diagram. So we could draw a box, representing everything that could

Â conceivably happen. So that's Omega. Then we can draw a blob corresponding to the

Â event X. So that's some stuff. Might or might not happen, who knows. And then the

Â other event Y is some other stuff which might or might not happen. And in general

Â these two events could be disjoint, that is they could have no intersection. Or

Â they might have a non-trivial intersection. X intersect Y. Similarly

Â they need not cover omega. It's possible that nothing X nor Y happens. So what's

Â we're looking to define is the probability of the event X given the even Y. So we

Â write probability of X bar Y, phrased X given Y. And the definition is, I think,

Â pretty intuitive. So given Y means we assume that something in Y happened.

Â Originally anything in omega could have happened. We didn't know what. Now we're

Â being told that whatever happened that lies somewhere in Y. So we zoom in on the

Â part of the picture that, in which contains Y. So that's gonna be our

Â denominator. So, our new world is the stuff in Y. That's what we know happened.

Â And now we're interested in the proportion of Y that is filled up with X. So, we're

Â interested in what fraction of Y's area is occupied by stuff in X. So X intersect Y,

Â divided by the probability of Y. That is by definition the conditional probability

Â of X given Y. Let?s turn to a quiz, using our familiar example of rolling two dice.

Â To make sure that the definition of conditional probability makes sense to

Â you. Okay, so the correct answer to this quiz is the third answer. So let's see why

Â that is. So what are the two events that we care about? We want to know the

Â probability of X given Y, where X is the event that at least one die is a one. And

Â Y is the events that the sum of the two dice is seven. Now, the easiest way to

Â explain this is let's zoom in, let's drill down on the Y. Let's figure out exactly

Â which outcomes Y comprises. So the sum of the two dice, being seven, we saw in the

Â first part of the review, there's exactly six outcomes which give rise to the sum

Â seven, namely the ordered pairs one, six. Two five, three four, four three, five

Â two, and six one. Now, remember that the probability. Of x given y is by definition

Â the probability of x intersect y divided by the probability of y. Now, what you

Â notice from this formula is we actually don't care about the probability of x per

Â se or even about the event x per se, just about x intersect y. So, let's just fig,

Â so, now we know why there has to be six outcomes. Which of those also belong to x?

Â Well, x is those where at least one die is one. So, x intersect y is just going to be

Â the one, six and the six, one. Now the probability of each of the 36 possible

Â outcomes is equally likely. So each one is one over 36. So since X intersects Y, has

Â only two outcomes. That's gonna give us two over 36 in the numerator. Since Y has

Â six outcomes, that gives us a six over 36 in the denominator. When you cancel

Â everything out, you're left with a one third. So just applying the definition of

Â conditional probability to the correct definition of the two relevant events, we

Â find that indeed a third of the time is when you have a one condition on the sum

Â of the two being seven. Let's move on to the independence of two events. So. Again

Â we consider two events, x and y. By definition, the events are dependent if

Â and only if the following equation holds. The probability that both of them happen.

Â That is the probability of x intersect y is exactly equal to the probability that x

Â happens times the probability that y happens. So that's a simple innocuous

Â looking definition. Let me re phrase it in a way that it's even more intuitive. So

Â I'll you check this, it's just a some trivial algebra. This equation holds, for

Â the events X and Y, if and only if, this is just using the definition of

Â conditional probability we had on the last slide, if and only if the probability of X

Â given Y, Is exactly the same thing as the probability of x. So, intuitively, knowing

Â that y happens, gives you no information about the probability that x happens.

Â That's the sense in which x and y are independent. And, you should also check

Â that this holds if and only if, the probability of y, given x, equals the

Â probability of y. So, symmetrically, knowing that X has occurs gives you no

Â information, no new information about whether or not Y has occurred. The

Â probability of Y is unaffected by conditioning on X. So at this juncture I

Â feel compelled to issue a warning. Which is, you may feel like you have a good

Â grasp of independence. But, in all likelihood, you do not. For example I

Â rarely feel confident that I have a keen grasp on independence. Of course I use it

Â all the time in my own research and my own work, but it's a very subtle concept. Your

Â intuition about independence is very often wrong, even if you do this for a living. I

Â know of no other source that's created so many bugs in proofs by professional

Â mathematicians and professional computer science researchers as misunderstandings

Â of independence and using intuition instead of the formal definition. So, for

Â those of you without much practice with independence, here's my rule of thumb for

Â whether or not you treat random variables as independent. If things are independent

Â by construction, like, for example, you define it in your algorithm, so the two

Â different things are independent. Then you can proceed with the analysis under the

Â assumption that they're independent. If there's any doubt, if it's not obvious the

Â two things are independent, you might want to, as a rule of thumb, assume that

Â they're dependent until further notice. So the slide after next will give you a new

Â example showing you things which are independent and things which are not

Â independent. But before I do that I wanna talk about independence of random

Â variables rather than just independence of events. So you'll recall a random variable

Â is from the first video on probability review. It's just a real value function

Â from the sample space to the real numbers. So once you know what happens you have

Â some number. The random variable evaluates to some real number. Now, what does it

Â mean for two random variables to be independent? It means the events of the

Â two variables taking on any given pair of values are independent events. So

Â informally, knowing the value taken on by one of the random variables tells you

Â nothing about what value is taken on by the other random variable. Recalling the

Â definition of what it means for two events to be independent, this just means that,

Â the probability that A takes on value little a, B takes on value little b. The

Â probability that both of those happen is just the product of the probabilities that

Â each happens individually. So what's useful about independence of events is

Â that probabilities just multiply. What's useful about independence of random

Â variables is that expectations just multiply. So, we're going to get an analog

Â of linear expectation where we can take, we can interchange an expectation in the

Â product freely, but I want to emphasize this, this interchange of the expectation

Â of the product is valid only for independent random variables and not in

Â general, unlike linear expectations. And we'll see a non example. We'll see how

Â this fails on the next slide for non independent random variables. So, I'll

Â just state it for two random variables, but the same thing holds by induction for

Â any number of random variables. If two random variables are independent, then the

Â expectation of their product. Equals the product of their expectations. And again,

Â do not forget that we need a hypothesis. Remember, linearity of expectations did

Â not have a hypothesis for this statement about products. We do have a hypothesis of

Â them being independent. So why is this true? Well, it's just a straight forward

Â derivation where you follow your nose or write it out here for completeness, but,

Â but I really don't think it's that important. So you start with the

Â expectation of the product. This is just the average value of A times B, of course

Â weighted by the probability of, of any particular value. So the way we're gonna

Â group that sum is we're going to sum over all possible combinations of values, A and

Â B, that capital A and capital B might take on, so that's gonna give us a value of A

Â times B. Times the probability of that big A takes on the value of little a and

Â capital B takes on the value of little b. So that's just by definition where this is

Â the value of the random variable, capital A times capital B and this is the

Â probability that it takes on that value with the values A and B. Now because A and

Â B are independent, this probability factors into the product of the two

Â probabilities. This would not be true if they were not independent. It's true

Â because they're independent. So same sum where all possible joint values of all A

Â and B. You still have A times B. But now we have times the probability that A takes

Â on the value of A times the probability that B takes on the value of B. So now we

Â just need to regroup these terms. So let's first sum over A. Let's yank out all the

Â terms that depend on little a. Notice none of those depend on little b. So we can

Â yank it out in front of the sum over little b. So I have an A times the

Â probability that big A takes on the value of little a. And then the stuff that we

Â haven't yanked out is the sum over b, of b times, little b times the probability that

Â capital B takes on the value little b. And what's here inside the quantity? This is

Â just the definition of the expectation of b. And then what remains after we have

Â factored out the expectation of b? Just this other sum which is the definition of

Â the expectation of a. So, indeed four independents random variables, the

Â expected value of the product is equal to the product of the expectations. Let's now

Â wrap up by tying these concepts together in an example, a simple example that

Â nevertheless illustrates how it can be tricky to figure out what's independent

Â and what's not. So here's the set up. We're going to consider three random

Â variables. X1, X2 and X3. X1 and X2 we choose randomly, so they're equally likely

Â to be zero or one. But X3 is completely determined by X1 and X2. So it's gonna be

Â the XOR of X1 and X2. So XOR stands for exclusive or. So what that means is that

Â if both of the operands are zero, or if both of them are one, then the output is

Â zero. And if exactly one of them is one, exactly one of them is zero, then the

Â output is one. So it's like the logical or function, except that both of the inputs

Â are true, then you output false, okay? So that's exclusive or. Now this is a little

Â hand wavy, when we start talking about probabilities, if we want to be honest

Â about it, we should be explicit about the sample space. So what I mean by this, is

Â that X1 and X2 take on all values, they're equally likely. So we could have a zero,

Â zero or a one zero or a zero one or a one, one and in each of these four cases, X3 is

Â determined by the first two, as the X or, so you get a zero here, a one here, a one

Â here and a zero there. And each of these four outcomes is equally likely. So let me

Â now give you an example of two random variables, which are independent, and a

Â non example. I'll give you two random variables which are not independent. So

Â first, I claim that, if you think that X1 and X3, then they're independent random

Â variables. I'll leave this for you to check [sound]. This may or may not seem

Â counter-intuitive to you. Remember X3 is derived in part from X1. Never the less,

Â X1 and X3, are indeed independent. And why is that true? Well, if you innumerate over

Â the four possible outcomes, you'll notice that all four possible two byte strings

Â occur as values for one and three. So here they're both zero, here they're both one,

Â here you have a zero and one, and here you have a one and zero. So you've got all

Â four of the combinations of probability one over four. So it's just as if X1 and

Â X3 were independent fair coin flips. So that's basically why the claim is true.

Â Now. That's a perhaps counterintuitive example of independent random variables.

Â Let me give you a perhaps counterintuitive example of dependent random variables.

Â Needless to say, this example just scratches the surface and you can find

Â much more devious examples of both independent and non-independents if you

Â look in, say, any good book on discrete probability. So now let?s consider the

Â random variable X1 product X3. And X two and the claim is these are not

Â independent. So this'll give you a formal proof for. The way I'm going to prove this

Â could be slightly sneaky. I'm not going to go back to the definition. I'm not gonna

Â contradict the consequence of the definition. So it's proved that they're

Â not independent all I need to do, is show that the product of the expectations is

Â not the same as the expectations to the products. Remember if they were

Â independent, then we would have that equality. [inaudible] Product of

Â expectations will equal the expectation to products. So if that's false than there's

Â no way these random variables are independent. So the expectation of the

Â product of these two random variables is just the expected value of the product of

Â all three. And then on the other side, we look at The product of the expected value

Â of X1 and X3. And the expected value of X2. So let's start with the expected value

Â of X2. That's pretty easy to see. That is zero half the time and that is one half

Â the time. So the expected value of X2 is going to be one-half. How about the

Â expected value of X1 and X3? Well, from the first claim, we know that X1 and X3

Â are independent random variables. Therefore, the expected value of their

Â product is just the product of their expectations. Equal this expectations

Â equal to the expected value of X1 times the expected value of X2, excuse me, of

Â X3. And again, X1 is equally likely to be zero or one. So its expected value is a

Â half. X3 is equally likely to be zero or one so its expected value is a half. So

Â the product of their expectations is one-fourth. So the right-hand side here is

Â one-eighth; one-half times one-fourth, so that's an eighth. What about the left-hand

Â side, the expected value of X1 times X3 times X2? Well, let's go back to the

Â sample space. What is the value of the product in the first outcome? Zero. What

Â is the value of the product in the second outcome? Zero. Third outcome? Zero. Forth

Â outcome? Zero. The product of all three random variables is always zero with

Â probability one. Therefore, the expected value, of course, is gonna be zero. So

Â indeed, the expected value of the product of X1, X3 and X2 zero does not equal to

Â the product of the corresponding expectations. So this shows that X1, X3

Â and X2 are not independent.

Â