0:01

Hi, my name is Brian Caffo and this is Lecture two of mathematical biostatistics

Â boot camp. In the last lecture, we covered

Â probability and the basics of bio-statistics at a very conceptual level.

Â In this lecture, we are going to get much more down to specifics.

Â First, we are going to cover subject of probability, which are mathematical

Â functions, so we will talk about the specifics of those kinds of mathematical

Â functions. Then in two, we will talk about random

Â variables. Random variables are just like any other

Â variables that you maybe encountered before like calculus, with the exception

Â that they are random. They can take lot of different values.

Â In, section three, we'ere going to talk about probability mass functions and

Â probability density functions. Theses are mathematical functions that map

Â probability to random variables. In section four we're going to talk about

Â so called commutative distribution functions, CDF's, very associated things

Â called survival functions, and quantiles. And then we'll wrap up with a brief

Â summary. So a probability measure is the function

Â that's going to govern the rules of probability for us.

Â And there is basically three rules that a probability measure has to follow.

Â And these three rules and every probability textbook will give you these

Â three rules are things that are equivalent.

Â There is interesting history behind these three rules.

Â The Russian mathematician Kolmogorov, who's generally considered the father of

Â all of the modern probability. Basically distilled everything that we

Â thought of as in terms of things that a probability should have to follow.

Â He minimized them down to the minimal set of rules that you could possibly have.

Â If you delete any of these rules you wind up with something that fails in some

Â fundamental way to be probability. And if you add any other rules they turn

Â out to be excessive. So it's really kind of an interesting

Â collection of research he did. It's also interesting to note that

Â [inaudible] tried to do something else. Which is to figure out what exactly it is

Â we mean by probability. So he found that problem to be very hard,

Â and I think if you look into it, the theory of exactly what is randomness, and

Â exactly. What is probability measure is a very deep

Â problem, and philosophers are still debating this, and I question whether or

Â not it'll ever reach a resolution. However.

Â One thing that's much less controversial is what rules probability has to follow

Â when comograph just nailed it, its done. So lets go over these three rules.

Â So probability measure P, the letter P here in italics, is a function that maps

Â events, which are subset of sample space to numbers between zero and one, that's

Â item one here. So events E have to be mapped to numbers

Â between zero and one. So probability is a function that operates

Â on sets. The second item here says that the

Â probability of the whole sample space has to be one.

Â Basically what this means is that something has to happen.

Â The sample space has to enumerate everything possible that can happen.

Â So for example, if you are flipping a coin, the coin can either be heads or

Â tails. The sample space is heads or tails when

Â you flip the coin, one of those two things has to happen.

Â The probability of one of the happening is one.

Â The coin can't land on it's side. If you want to allow the coin to land on

Â it's side, then it has to be heads, tails and land on it's side as the sample space.

Â The third statement and we will talk a lot about the third statement because we are

Â giving you an incorrect version of it. The third statement says that if two

Â events are mutually exclusive and recall events from mutual exclusive if they have

Â no intersection. If two events E1 and E2 are mutually

Â exclusive then the probability of the union is the sum of their probabilities.

Â So as an example we just talked about coin flipping, we said the probability of a

Â head or tail has to be one. The probability of either getting a head

Â or tail has to be one. So let's talk about that in the context of

Â real free. If E1 is the event that you get a head and

Â E2 is the event that you get a tail, then the probability of E1 union E2, the

Â probability to get a head or tail winds up being the probability of getting a head,

Â let's say is.5, plus the probability of getting a tail which is.5 which adds up to

Â one, exactly what we know has to. So in part three, the third rule that we

Â talked about in the previous slide, I said that there was some concern over it not

Â being complete, so I'm going to elaborate on what I mean by that in this slide.

Â First of all, let's note the following fact.

Â Part three of the previous slide, the fact that if you have two mutually exclusive

Â events, the probability of their union is the sum of their probabilities.

Â That pretty easily extends to the so called finite additivity, that instead of

Â having two, if you had three, or four, or five, or let's just say end events, that

Â the probability of their union. Equals the sum of their probabilities.

Â So in this case I have the probability of the union of a collection of mutually

Â exclusive events Ai, equals the sum of their probabilities.

Â That pretty directly follows from the previous definition, just to give you a

Â sense of how it works. If you had three events say A1, A2 and A3

Â and they are all. Mutually exclusive.

Â Then the probability of a1 union a2 and a3 is the sum of the probability of a1 plus

Â the probability of a2 union a3 right because a1 is mutually exclusive from the

Â union of a2 and a3. And then that second probability, the

Â probability of a2 union a3 is then again the probability of the union of two

Â mutually exclusive events. So it is the probability of a2 plus the

Â probability of a3. And you can formalize this with

Â mathematical induction if you want. So at any rate, the rule that I gave you

Â implies so-called finite additivity. And it seems like maybe that should be

Â enough to cover everything. Well the probabilists have thought very

Â hard, and they said well. Maybe we think it should be countable

Â additivity, instead of n it should go up to infinity.

Â And then it's not the case that the definition that we gave implies countable

Â additivity. That if you add an infinite collection of

Â mutually exclusive events that the probability of the union is the sum of the

Â probabilities, which requires ideas of limits and other things that we're not

Â going to cover so much in this class. So at any rate, it's the case that finite

Â additivity does not imply countable additivity, but of course countable

Â additivity implies finite additivity. So, in standard probability classes, in

Â the more theoretical probability classes, they make quite a bit of hay out of this

Â distinction. They discuss it a lot.

Â And the general definition gives countable additivity rather than finite additividty.

Â If you take a more advanced measure theoretic probability class, they will

Â deal with this issue at length. In this class, this will be the last time

Â we discuss this. In general, finite additivity will work

Â just fine for us. In the next slide, we are going to talk

Â about more details about what the probability functions operates on.

Â And again, it's going to be a rather important but maybe unnecessary detail for

Â this class, so we are going to... Again it's going to be another thing that

Â we cover very briefly and then tend not to think about for the remainder of the

Â lectures. Recall that our probability function

Â operates on events which are subsets of the sample space and maps them to numbers

Â between zero and one. So we need an appropriate domain.

Â Of our function, our domain is not an event, it's a collection of events.

Â So let me go through an example to make this idea a little bit more clear.

Â So let's suppose the sample space is simply the numbers one, two, or three.

Â Imagine somehow if you had a three sided die, that you were rolling.

Â Then, the. Probability function operates on all

Â possible events, that are subsets of that sample space.

Â So in this case the null event. The event, that you get a one.

Â A two, a three. A one or two.

Â A one or three. A two or three.

Â Or the whole sample space, a one, two or three.

Â And this is fine. Pretty much whenever you have a finite

Â set, the domain of the probability function will operate on all possible

Â subsets of the sample space. In this case we're using the letter script

Â F to denote this so called domain. When the sample space is a continuous set,

Â it actually gets a lot harder. And you can no longer say things like the

Â probability operates on the set of all possible subsets of a continuous set.

Â And it turns out that, that is an incredibly deep mathematical problem.

Â The mathematician Cantor thought about measure and sets in a very deep way, and

Â if you want to read about it, interesting character in the history of mathematics,

Â you should read about Cantor. He came up with interesting sets that, for

Â example, you can't reasonably include in the definition of a probability.

Â So in this class we're not going to think about this at all.

Â But I wanted to raise it just for those students that go on to take some of these

Â more advanced classes. So that you'll be prepared for some of

Â these admittedly kind of strange ideas that come up when you when you try to talk

Â about the set of sets that probabilities operate on.

Â For our purposes. When our sample spaces continue a set, we

Â are mostly going to be concerned with things like intervals or unions of

Â intervals. And in that case, definitions are very

Â easy. So our definition of the domain that the

Â probability operates on, we are just going to assume that anything that we can think

Â of, and since none of are Cantor, probably we won't think of anything too crazy.

Â Anything that we can think of is just fine.

Â And that definition works very well for this class.

Â In this slide, we're going to give a laundry list of properties that a

Â probability function has to have by virtue of its three definitions.

Â So it, you should find it kind of interesting that the three definitions

Â then imply all these things that we know probabilities have to have.

Â So take this first bullet here. The probability of the no said is zero,

Â basically the probability nothing happens is Zero.

Â So if you say you're going to roll a die, you actually roll a die, if you say you

Â can flip a coin, you actually flip a coin. That's basically what the probability of

Â the no said game zero is. The second bullet says the probability of

Â an event is one minus the probability with compliment.

Â In other words for example, if E is the probability that you get a head when you

Â flip a coin. The probability of getting a head is one

Â minus the probability of getting a tail and that's off-course true on a fair coin,

Â where the probability of head is 0.5 and the probability of tail is 0.5.

Â But lets suppose you have an unfair coin, maybe you, glued together, nickel and a US

Â dyne and made a funny shaped coin that you didn't know whether or not, the

Â probability of head was 0.5, lets suppose the probability of head in that case was

Â 0.3. Well, this would say if the probability of

Â head is 0.3 then the probability of the tail has to be 0.7.

Â The next bullet says that the probability of the union of two events is the

Â probability of their sum. And that's all we would have to say if the

Â events are mutually exclusive. But, we have to subtract off the

Â intersection. If they are not mutually exclusive.

Â And the intuition behind this statement is something like this.

Â When you add the probability of A, you've added the probability of A.

Â Which includes the part of A that intersects B, and the part of A that does

Â not intersect B. And then you've added.

Â The probability of b, which includes the part of b that intersects a, and the part

Â of b that does not intersect a. So you have then just added that part of a

Â that intersects b and the part of b that intersects a, you've added it twice.

Â Once, when you added probability of a, once when you added probability of b.

Â You've added it twice, you only want to add it once, so subtract it out.

Â That's how the rule works. The next bullet point is a pretty simple

Â point, if A is a subset of B then the probability of A is less than or equal to

Â the probability of B. So this is analogous to saying if I am

Â rolling a die and A is say the event, that I get a one and B is the event that I get

Â a one or two, then the probability of getting a one is less than the probability

Â of getting a one or a two. And so this role I think makes a lot of

Â sense. From DeMorgan's laws we get probability of

Â A union B is one minus the probability of A complement intersect B complement.

Â The next bullet point is kind of a long though, lines of subtraction.

Â So A intersect B compliment, that set is sort of like subtracting B out of A, the

Â component of A that has nothing to do with B.

Â So the probability of A removing B is the probability of A minus the probability of

Â A intersect B, so that works out to be a nice rule that sort of set levels

Â subtraction works out to be equivalent to subtracting the probabilities.

Â The next bullet talks about the probability of the union events again.

Â This says the probability of the union of a collection of events is less than or

Â equal to the sum of the probability of the events.

Â Now again, if the events are mutually exclusive, then the probability of the

Â union has to equal the sum of the probabilities.

Â So this rule doesn't violate that rule whatsoever, but it also accounts for the

Â times when the events are not mutually exclusive.

Â The final rule talks, again, about unions of events.

Â In this case, the probability that the union of events is bigger than the

Â probability of the maximum of the collection of probabilities.

Â Again, this rule holds if the events are mutually exclusive or not.

Â But there's intuition behind this that's very easy.

Â The union, is. Everything that's in any, of the events.

Â E1 To EN. So it contains anything.

Â The probability of that has to be bigger than, any of its.

Â Component events. I think that makes quite a bit of sense.

Â So just, let me give you an example. Go back to our die roll, if E1 is the

Â event that you get a one, E2 is the event that you get a two, E2 is the event that

Â you get a three. The probability on the left hand side of

Â the equation is the probability that you get a one, two or three on the right hand

Â side it says that it's the maximum probability.

Â If you are talking about a standard dye probability of one is 1/6th, probability

Â two is 1/6th, probability three is 1/6th. So the maximum of them is 1/6th.

Â On the left hand side the probability of the union is the probability of a one, two

Â or three which is one half. So half is definitely bigger than 1/6th.

Â So let me, give you an example of one of these proofs.

Â So let's take, a simple one. The probability, of an event is one minus

Â the probability of its compliment. So consider line one.

Â Recall that the probability of the whole sample space is one.

Â But, again, the sample space for any event is equal to the union of that event and

Â its complement. So.

Â Omega equals e union e-complement. Then consider the next line.

Â An event is always mutually exclusive with its complement.

Â Something cannot simultaneously occur and not occur, so events are always mutually

Â exclusive with their compliment. So e and e compliment are mutually

Â exclusive events. So we can take the probability of the

Â union and turn it into sum of the probabilities, the probability of the,

Â possibly the probability of the compliment and then that's simply a restatement what

Â we want to prove, one equals the probability pos probability of compliment.

Â Let's do a more complex example of the consequences of the probability rules.

Â So recall that we discussed that the probability of the union of a collection

Â of events is less than or equal to the sum of the probabilities.

Â And recall that less than or equal to is an equality if the events are mutually

Â exclusive. So let's prove this using mathematical

Â induction. The way mathematical induction works is

Â you prove it for some small statement, one or two, then you assume that it's true for

Â say n minus one, and then prove that it's true for n.

Â That's how mathematical induction works. So let's consider just two events,

Â probability of e1 union e2. Well that's by one of the other

Â consequences of the probability rules that we investigated.

Â That's equal to probability of e1 plus the probability of e2 minus the probability of

Â e1 intersect e2, and here I'm assuming that we've gone ahead and proved that one

Â as well. So.

Â This final term here that's subtracted off, minus probability e1 e2.

Â We are subtracting off a number that has to be positive.

Â Remember probabilities have to be between zero and one, so they have to be

Â non-negative at least. So if we throw away that final term.

Â What's left can only get bigger, right? So if we're subtracting off a positive

Â number and we throw it away, then it's gotta get bigger.

Â So, then we've established the result for the case when we have two events.

Â Now let's assume the result is true when we have n minus one events, and let's

Â consider n events. So we want to demonstrate that the

Â probability of the union of the EI is less than or equal to the sum of the

Â probabilities. So let's write out the probability of the

Â union of the EI's as EN union with the union of the rest of them.

Â So the union of the rest of them I co one to N-1 is a single set.

Â We've already done that, that's just two sets EN and the union of the remainder are

Â two separate sets, we already worked it out for two sets.

Â So we can say that the probability of the union E1 to EN is less or equal to the

Â probability of EN, plus the probability of the remainder.

Â Now consider the next line. In the next line, we have the probability

Â of E N from the next line. And then we can say, that we have only

Â gotten bigger by our induction hypothesis. By the fact that we assume that this

Â statement is true for N minus one events. So there if we switch this probability

Â from the probability of the union to the sum of the probabilities.

Â We've only made it bigger. So we can maintain that inequality.

Â Then just collecting the terms, then we just have that this is the sum of the

Â probabilities. And just to give you a sense of notation I

Â use, when I write equals on this last line I mean it equals the previous line, not

Â that it's equal to the first line. So I am assuming that it's less than or

Â equal to, less than or equal to and equal to.

Â Implying that the final statement is less than or equal to the first statement, but

Â equal to the previous lines. So that's notation that I commonly use.

Â So you should be able to prove all of these probability statements that we

Â outlined on the previous slide. This particular one, let's go ahead and

Â take a step back from the mathematics and try and put some of this within a context.

Â So the National Sleep Foundation reports that around three percent of the American

Â population has Sleep Apnea. This is a, sleep disease where the upper

Â airways collapses. They also report that around ten percent

Â of the North American and European population has restless leg syndrome.

Â For the purpose of our discussion let's just assume that this is ten percent of

Â the American population has, restless leg syndrome.

Â Similarly they report that 58 percent of adults in the US experience insomnia.

Â So imagine if you were a sleep physician and you wanted to know the probability

Â that a random american has any of these three sleep disorders.

Â Can you simply add these probabilities, three%, ten%, 58 percent and get 71

Â percent of people have at least one of these sleep problems.

Â So this question is nothing other than, restatement of the probability

Â relationship that we just proved. So hear I am using A instead of E, but

Â maybe that's a good thing to do, just so you get used to not using the same letter

Â for everything. So lets A1 be that the person has sleep

Â Apia, A2 be the event the person has restless leg syndrome, and A3 be the event

Â that the person has Insomnia. And I'm gonna gloss over the details, but

Â the probability that a person has at least one of these diseases is, we're talking

Â about the union, A1 union, A2 union, A3. So we want to know the probability of the

Â union. Well that's only equal to the sum of the

Â probabilities, right? When a1, a2, and a3 are mutually

Â exclusive. Otherwise it's the probability of a1 plus

Â the probability of a2 plus the probability of a3, and we have to subtract out other

Â things. And in this case I give you the exact

Â equation for relating the probability of the union of three events to the

Â probability of A1, A2 and A3 and so works out to be.71 but then there's all the

Â other stuff that A1 intersect A2, A1 intersect A3, A2 intersect A3 and then you

Â have to add in the triple intersect in A1 intersect A2 intersect A3.

Â I would suggest you go through and figure out why exactly it is this formula works

Â out. But the point is that other stuff is

Â non-trivial and it's always there unless a-one, a-two, and a-three are mutually

Â exclusive. And so you can't simply add these other

Â things. And in fact, in this case, from a

Â scientific perspective, I mean we're talking about it from a mathematical

Â perspective, but from a scientific perspective it's probably the case that

Â there's a non trivial interception of people with sleep apnea and restless leg

Â syndrome, and a non trivial interception of people with restless leg syndrom and

Â insomnia and so on. So that this point seven one is not close

Â at all. So that ends our whirlwind tour of the

Â basics of probability mathematics. Next, we're gonna talk about random

Â