So welcome to part two of our probability review. This video assumes you've already watched part one or at least are familiar with concepts covered in part one. Namely sample spaces, events, random variables, expectation and linearity of expectation. In this part of the review we're going to be covering just two topics. Conditional probability and the closer related topic of independence. Both between events and between random variables. I want to remind you that this is by no means the only source you can or should use to learn this material. A couple of other sources free that I recommend are lecture notes that you can find online by Eric. And also there's a wiki book on discrete probability. So, conditional probability, I hope you're not surprised to hear, is fundamental to understanding randomized algorithms. That said, in the five weeks we have here, we'll probably only use it once. And that's in analyzing the correctness of the random contraction algorithm for computing the minimum cut of an undirected graph. So, just to make sure we're all on the same page, here's some stuff you should have learned, from part one of the probability review. You should know what a sample space is. This represents all of the different outcomes of the random coin flips, all of the different things that could happen. Often in randomized algorithm analysis, this is just all of the possible random choices that the algorithm might make. Each outcome has some known probability [inaudible]. By and, of course, the sum of the probabilities equal one and remember that event is nothing more than a subset of omega. Omega is everything that could possibly happen. S is some subset of things that might happen and, of course, the probability of event is just the probability of, of all the outcomes that the event contains. So, let's talk about conditional probability. So one discusses the conditional probability of one event given a second event. So, let X and Y denote two events, subsets of the same sample space. You might want to think about these two events X and Y in terms of an event diagram. So we could draw a box, representing everything that could conceivably happen. So that's Omega. Then we can draw a blob corresponding to the event X. So that's some stuff. Might or might not happen, who knows. And then the other event Y is some other stuff which might or might not happen. And in general these two events could be disjoint, that is they could have no intersection. Or they might have a non-trivial intersection. X intersect Y. Similarly they need not cover omega. It's possible that nothing X nor Y happens. So what's we're looking to define is the probability of the event X given the even Y. So we write probability of X bar Y, phrased X given Y. And the definition is, I think, pretty intuitive. So given Y means we assume that something in Y happened. Originally anything in omega could have happened. We didn't know what. Now we're being told that whatever happened that lies somewhere in Y. So we zoom in on the part of the picture that, in which contains Y. So that's gonna be our denominator. So, our new world is the stuff in Y. That's what we know happened. And now we're interested in the proportion of Y that is filled up with X. So, we're interested in what fraction of Y's area is occupied by stuff in X. So X intersect Y, divided by the probability of Y. That is by definition the conditional probability of X given Y. Let?s turn to a quiz, using our familiar example of rolling two dice. To make sure that the definition of conditional probability makes sense to you. Okay, so the correct answer to this quiz is the third answer. So let's see why that is. So what are the two events that we care about? We want to know the probability of X given Y, where X is the event that at least one die is a one. And Y is the events that the sum of the two dice is seven. Now, the easiest way to explain this is let's zoom in, let's drill down on the Y. Let's figure out exactly which outcomes Y comprises. So the sum of the two dice, being seven, we saw in the first part of the review, there's exactly six outcomes which give rise to the sum seven, namely the ordered pairs one, six. Two five, three four, four three, five two, and six one. Now, remember that the probability. Of x given y is by definition the probability of x intersect y divided by the probability of y. Now, what you notice from this formula is we actually don't care about the probability of x per se or even about the event x per se, just about x intersect y. So, let's just fig, so, now we know why there has to be six outcomes. Which of those also belong to x? Well, x is those where at least one die is one. So, x intersect y is just going to be the one, six and the six, one. Now the probability of each of the 36 possible outcomes is equally likely. So each one is one over 36. So since X intersects Y, has only two outcomes. That's gonna give us two over 36 in the numerator. Since Y has six outcomes, that gives us a six over 36 in the denominator. When you cancel everything out, you're left with a one third. So just applying the definition of conditional probability to the correct definition of the two relevant events, we find that indeed a third of the time is when you have a one condition on the sum of the two being seven. Let's move on to the independence of two events. So. Again we consider two events, x and y. By definition, the events are dependent if and only if the following equation holds. The probability that both of them happen. That is the probability of x intersect y is exactly equal to the probability that x happens times the probability that y happens. So that's a simple innocuous looking definition. Let me re phrase it in a way that it's even more intuitive. So I'll you check this, it's just a some trivial algebra. This equation holds, for the events X and Y, if and only if, this is just using the definition of conditional probability we had on the last slide, if and only if the probability of X given Y, Is exactly the same thing as the probability of x. So, intuitively, knowing that y happens, gives you no information about the probability that x happens. That's the sense in which x and y are independent. And, you should also check that this holds if and only if, the probability of y, given x, equals the probability of y. So, symmetrically, knowing that X has occurs gives you no information, no new information about whether or not Y has occurred. The probability of Y is unaffected by conditioning on X. So at this juncture I feel compelled to issue a warning. Which is, you may feel like you have a good grasp of independence. But, in all likelihood, you do not. For example I rarely feel confident that I have a keen grasp on independence. Of course I use it all the time in my own research and my own work, but it's a very subtle concept. Your intuition about independence is very often wrong, even if you do this for a living. I know of no other source that's created so many bugs in proofs by professional mathematicians and professional computer science researchers as misunderstandings of independence and using intuition instead of the formal definition. So, for those of you without much practice with independence, here's my rule of thumb for whether or not you treat random variables as independent. If things are independent by construction, like, for example, you define it in your algorithm, so the two different things are independent. Then you can proceed with the analysis under the assumption that they're independent. If there's any doubt, if it's not obvious the two things are independent, you might want to, as a rule of thumb, assume that they're dependent until further notice. So the slide after next will give you a new example showing you things which are independent and things which are not independent. But before I do that I wanna talk about independence of random variables rather than just independence of events. So you'll recall a random variable is from the first video on probability review. It's just a real value function from the sample space to the real numbers. So once you know what happens you have some number. The random variable evaluates to some real number. Now, what does it mean for two random variables to be independent? It means the events of the two variables taking on any given pair of values are independent events. So informally, knowing the value taken on by one of the random variables tells you nothing about what value is taken on by the other random variable. Recalling the definition of what it means for two events to be independent, this just means that, the probability that A takes on value little a, B takes on value little b. The probability that both of those happen is just the product of the probabilities that each happens individually. So what's useful about independence of events is that probabilities just multiply. What's useful about independence of random variables is that expectations just multiply. So, we're going to get an analog of linear expectation where we can take, we can interchange an expectation in the product freely, but I want to emphasize this, this interchange of the expectation of the product is valid only for independent random variables and not in general, unlike linear expectations. And we'll see a non example. We'll see how this fails on the next slide for non independent random variables. So, I'll just state it for two random variables, but the same thing holds by induction for any number of random variables. If two random variables are independent, then the expectation of their product. Equals the product of their expectations. And again, do not forget that we need a hypothesis. Remember, linearity of expectations did not have a hypothesis for this statement about products. We do have a hypothesis of them being independent. So why is this true? Well, it's just a straight forward derivation where you follow your nose or write it out here for completeness, but, but I really don't think it's that important. So you start with the expectation of the product. This is just the average value of A times B, of course weighted by the probability of, of any particular value. So the way we're gonna group that sum is we're going to sum over all possible combinations of values, A and B, that capital A and capital B might take on, so that's gonna give us a value of A times B. Times the probability of that big A takes on the value of little a and capital B takes on the value of little b. So that's just by definition where this is the value of the random variable, capital A times capital B and this is the probability that it takes on that value with the values A and B. Now because A and B are independent, this probability factors into the product of the two probabilities. This would not be true if they were not independent. It's true because they're independent. So same sum where all possible joint values of all A and B. You still have A times B. But now we have times the probability that A takes on the value of A times the probability that B takes on the value of B. So now we just need to regroup these terms. So let's first sum over A. Let's yank out all the terms that depend on little a. Notice none of those depend on little b. So we can yank it out in front of the sum over little b. So I have an A times the probability that big A takes on the value of little a. And then the stuff that we haven't yanked out is the sum over b, of b times, little b times the probability that capital B takes on the value little b. And what's here inside the quantity? This is just the definition of the expectation of b. And then what remains after we have factored out the expectation of b? Just this other sum which is the definition of the expectation of a. So, indeed four independents random variables, the expected value of the product is equal to the product of the expectations. Let's now wrap up by tying these concepts together in an example, a simple example that nevertheless illustrates how it can be tricky to figure out what's independent and what's not. So here's the set up. We're going to consider three random variables. X1, X2 and X3. X1 and X2 we choose randomly, so they're equally likely to be zero or one. But X3 is completely determined by X1 and X2. So it's gonna be the XOR of X1 and X2. So XOR stands for exclusive or. So what that means is that if both of the operands are zero, or if both of them are one, then the output is zero. And if exactly one of them is one, exactly one of them is zero, then the output is one. So it's like the logical or function, except that both of the inputs are true, then you output false, okay? So that's exclusive or. Now this is a little hand wavy, when we start talking about probabilities, if we want to be honest about it, we should be explicit about the sample space. So what I mean by this, is that X1 and X2 take on all values, they're equally likely. So we could have a zero, zero or a one zero or a zero one or a one, one and in each of these four cases, X3 is determined by the first two, as the X or, so you get a zero here, a one here, a one here and a zero there. And each of these four outcomes is equally likely. So let me now give you an example of two random variables, which are independent, and a non example. I'll give you two random variables which are not independent. So first, I claim that, if you think that X1 and X3, then they're independent random variables. I'll leave this for you to check [sound]. This may or may not seem counter-intuitive to you. Remember X3 is derived in part from X1. Never the less, X1 and X3, are indeed independent. And why is that true? Well, if you innumerate over the four possible outcomes, you'll notice that all four possible two byte strings occur as values for one and three. So here they're both zero, here they're both one, here you have a zero and one, and here you have a one and zero. So you've got all four of the combinations of probability one over four. So it's just as if X1 and X3 were independent fair coin flips. So that's basically why the claim is true. Now. That's a perhaps counterintuitive example of independent random variables. Let me give you a perhaps counterintuitive example of dependent random variables. Needless to say, this example just scratches the surface and you can find much more devious examples of both independent and non-independents if you look in, say, any good book on discrete probability. So now let?s consider the random variable X1 product X3. And X two and the claim is these are not independent. So this'll give you a formal proof for. The way I'm going to prove this could be slightly sneaky. I'm not going to go back to the definition. I'm not gonna contradict the consequence of the definition. So it's proved that they're not independent all I need to do, is show that the product of the expectations is not the same as the expectations to the products. Remember if they were independent, then we would have that equality. [inaudible] Product of expectations will equal the expectation to products. So if that's false than there's no way these random variables are independent. So the expectation of the product of these two random variables is just the expected value of the product of all three. And then on the other side, we look at The product of the expected value of X1 and X3. And the expected value of X2. So let's start with the expected value of X2. That's pretty easy to see. That is zero half the time and that is one half the time. So the expected value of X2 is going to be one-half. How about the expected value of X1 and X3? Well, from the first claim, we know that X1 and X3 are independent random variables. Therefore, the expected value of their product is just the product of their expectations. Equal this expectations equal to the expected value of X1 times the expected value of X2, excuse me, of X3. And again, X1 is equally likely to be zero or one. So its expected value is a half. X3 is equally likely to be zero or one so its expected value is a half. So the product of their expectations is one-fourth. So the right-hand side here is one-eighth; one-half times one-fourth, so that's an eighth. What about the left-hand side, the expected value of X1 times X3 times X2? Well, let's go back to the sample space. What is the value of the product in the first outcome? Zero. What is the value of the product in the second outcome? Zero. Third outcome? Zero. Forth outcome? Zero. The product of all three random variables is always zero with probability one. Therefore, the expected value, of course, is gonna be zero. So indeed, the expected value of the product of X1, X3 and X2 zero does not equal to the product of the corresponding expectations. So this shows that X1, X3 and X2 are not independent.