0:00

Hi, my name is Brian Caffo. I'm in the department of biostatistics at

Â the Johns Hopkins Bloomberg School of Public Health and this is mathematical

Â biostatistics boot camp lecture four. Today, we're gonna talk about random

Â vectors and independence. Independence is a key ingredient in

Â simplifying statistical models. Independence is a, a useful assumption.

Â And we frequently use it in statistics to get a handle on complex phenomena.

Â In addition, we'll find that independence in identically distributed random

Â variables are going to be our canonical model for what we might think of as a

Â random sample. So let's just briefly talk about what

Â we're going to cover today. Random vectors, which are simple

Â collections of random variables. Then we'll talk about independence.

Â And you probably have a rough idea of what is meant by independence to begin with.

Â But we're gonna talk about the mathematical formalism a little bit.

Â We'll talk about correlation. And then, go over various mathematical

Â properties of the correlation and covariance operators.

Â And then we'll use our facts about independence, and variance, and

Â correlation to talk about properties of the sample mean.

Â And then we'll cover the sample variance, and end with some discussion.

Â This lecture is actually one of the hardest things in all of statistics.

Â I think if you can kind of understand this lecture, you've understood what the goal

Â of probability modeling and this kind of population modeling really is.

Â You might want to consider listening to it over and over again.

Â And it's incredibly difficult concepts until you internalize them.

Â And then once you internalize them they seem simple.

Â So what I, what I'm hoping in this lecture to do is to help you internalize them.

Â Okay. So random vector is nothing other than an

Â ordinary vector with random variables as its entries.

Â So if you have X and Y's random variables, then simply the ordered collection X comma

Â Y is a random vector. So just like, individual random variables

Â have densities in mass functions or distributions that govern their

Â probabilistic behavior. Random vectors have joint densities and

Â joint mass functions and joint distribution functions that govern their

Â probabilistic behavior. Lets just simply talk about densities in

Â mass functions to begin with. A joint density f of x, y first of all it

Â has to satisfy that it is positive everywhere.

Â Its the two dimensional random vector, so the surface f exists on the two

Â dimensional plane and the fact that f is greater than zero suggests that its height

Â everywhere is above the horizontal plane. And it has to integrate to one, but when

Â you integrate over the whole XY plane; so the Z direction has to be greater than

Â zero and the integral over the XY plane has to be one.

Â So it's a direct extension of the ordinary one-dimensional probability density

Â function, and I think from this definition you should probably be able to guess what

Â the definition of a joint density is for, by n random variables.

Â And then for discrete random variables, let's say f now is a joint probability

Â mass function. Then the joint probability mass function

Â F. Maps possible values of X and Y here.

Â So, lowercase x and lowercase y are the possible combined values of X and Y.

Â To probabilities, so to satisfy the definition of being a joint probability

Â mass function, f has to be bigger than zero for all possible combinations of x

Â and y, and then the sum over all possible combinations has to equal one.

Â By the way, the joint density function exactly works like a univariate density

Â function in that. In this case, volumes under it, so

Â integrals under it correspond to probabilities, and so the total area is

Â one and in the same case with the joint mass function, sums of collections of

Â possible values of x and y yield the probability of that collection.

Â So for this class, a general discussion of random vectors is probably too much, so

Â we're only going to focus on one specific kind of joint density that's particularly

Â manageable type. And that's when the random variables x and

Â y are independent. And what we'll see is that, for say joint

Â density or joint mass function, if the random variables x and y are independent

Â then the joint density just factors into the product of the two individual

Â densities. The f of x and g of y.

Â Basically this is what mathematically independence does for us a lot is it turns

Â complicated, multi-varied structures into products.

Â We're gonna use this, this fact a lot. And we'll explain some of the intuition

Â behind this. Thinking back to our early definitions of

Â probability, we were discussing the sample space and events, two events are

Â independent if the probability of their intersection is equal to the product of

Â their probabilities, so probability of A intersect B is probability of A times the

Â probability of B, so A and B are independent.

Â Incidentally, if this is true, then A is independent of B complement, B is

Â independent of A complement, and A complement is independent of B complement.

Â And the mathematical definition of independence is equivalent to our kind of

Â intuition of what it means to be independent.

Â A is unrelated to B. That's what the mathematical definition

Â implies and we'll kind of get a better sense of that.

Â For two random variables, we would maybe define that they're independent.

Â If you have any two sets, a and b, that probably the x lies in a, and y lies in b,

Â is the product of the probability that x lies in a regardless of what y is doing.

Â And the probability that y lies in b regardless of what x is doing.

Â And so that's just simply a direct extension of the definition of

Â independence above, that I think probably everyone is maybe a little bit familiar

Â with. We automatically think of independence all

Â the time already so if you would to ask nearly anyone who has basic amount of

Â mathematical training, what's the probability of getting two consecutive

Â heads on two consecutive coin flips? They would probably say okay well, the

Â probability of getting a head on the first one is a half, and the probability of

Â getting a head on the second one is half, so it's probably a quarter, right?

Â Well that's just an exact execution of the independence rule.

Â Let A, B the event that you get head on flip one.

Â B be the event you get head on flip two, and basically what you are saying is you

Â want the probability of the intersection ahead on flip one and two, and so then the

Â probability of that intersection is exactly the product of probabilities we

Â have independence probability of A times the probability of B so .5 times .5, which

Â is .25 or a quarter. So we use independence all the time and,

Â you know, the main consequence of independence is that probabilities of

Â independent things multiply to obtain the probability of both occurring.

Â But this creates a problem in that people have then gone onto extend this rule to

Â where they just multiply probabilities regardless of whether they're independent

Â and this can lead to tragic consequences. Here's a great example.

Â In Science, Volume 309 they report a physician who gave expert testimony in a

Â criminal trial and he was giving expert testimony on sudden infant death syndrome,

Â SIDS, which is this tragic phenomenon where a baby dies, for example, in the

Â middle of the night and no one exactly knows why.

Â So, a woman was on trial because she had two consecutive children who died of SIDS.

Â And there was a court case that then considered whether or not this was too

Â unlikely to happen by chance, and that it wasn't really SIDS, it was something

Â malicious on the part of the mother. So, the person who was testifying did the

Â following calculation. The person said well, the probability of

Â SIDS is one out of 8543. I'm not 100 percent clear where they got

Â that number, but lets assume for the case that's correct.

Â Then, the person giving the testimony said well then the probability that you have

Â two SIDS would be the product of that number twice, or the square of that

Â number. One over 8543 squared.

Â Based on this evidence the mother was convicted of murder.

Â So, what was this physician's mistake in this case.

Â For the purpose of this class, there is actually quite a bit of discussion you

Â could have over ethics, probability, evidence, and culpability based on this

Â case. There's quite a collection of complicated

Â issues that intersect when you're discussing a case liked this.

Â For example, where and how does this probability of a SID come from?

Â What's the evidence for it? How do you, you know balance medical

Â evidence when convicting a person or not convicting a person in a trial.

Â For the purpose of this class lets just simplify the discussion down to, is this

Â directed calculation warranted of simply multiplying this number twice, given that

Â it's correct? Well, if A one is the event that the first

Â child died of SIDS and A two is the event that the second child did, then the

Â inherent calculation that's, or the inherent assumption being made is that A1

Â is independent of A2 so that you can multiply the probability of A1 times the

Â probability of A2. But this logic fails immediately.

Â There's no reason to believe that the event of the second SID is independent of

Â the event of the first SID so in this case And in many cases in biology, biological

Â processes that have a genetic or familial component would tend to be dependent

Â within families. So you couldn't multiply the marginal

Â probabilities to obtain the intersection. And there's other problems, and I outlined

Â an example of one here, with this estimate.

Â The prevalence was obtained from an unpublished report on single cases, and

Â quite a bit of the discussion surrounding this case revolved around these and other

Â issues. But, the point I'm trying to make for the

Â purposes of this class is, you can't just go around multiplying probabilities

Â willy-nilly. The random variables or events that you're

Â discussing have to actually be independent.

Â Okay, so we'll use the following fact extensively in this class and we'll use it

Â as a basic simplifying principle. If we have a collection random variables

Â that are independent X1 up to Xn, then the joint distribution of X1 to Xn, or the

Â joint density function, is the product of the individual densities or mass

Â functions. So, in other words, the density of F of X1

Â up to Xn is the product of the individual densities.

Â And here I have fi of Xi, indicating that every Xi could potentially have a

Â different density. The most common model that we'll be

Â dealing with is the instance where X1, X2. All the way up to Xn are from the same

Â distribution. And, this particular case, we would say

Â that the Xi's are independent and identically distributed.

Â The independent being that X1 is independent from X2, and so on.

Â And identically distributed in that f1 is equal to f2 is equal to all the way up to

Â fn. Iid samples are very important in the

Â subject of statistics, and the reason for that is that IID random variables are a

Â basic kind of default model for random samples.

Â If you have a collection of things that are in, in essence we believe

Â exchangeable, then we treat them as if they are IID.

Â And many of the important theories of statistics are founded on the assumption

Â that variables are IID. So, to give you an example of IID random

Â samples, imagine just simply rolling a die.

Â Each roll of a die is a draw from the uniform distribution on the numbers of one

Â to six. So when we say that a process, when we

Â model a process as if it's IID, we're saying it's like we're rolling a die.

Â For each variable that we're modelling from some population level distribution.

Â I just wanna comment on the broader discussion on probability modelling, this

Â is never actually the case right? It, it's probably a very good model for

Â rolling a die, but we use IID to model things where surely the variables

Â themselves are not IID. We can rarely guarantee that are sample's

Â actually a random draw from some population distribution f over and over

Â again. The point is that it's statistical model

Â used to simplify calculations, and simplify our discussion.

Â But whenever we use this statistical model we have to be cognizant of the fact that

Â it is a model, and it's an enormously simplifying assumption.

Â Let's just go to a very important example of flipping a coin.

Â So imagine if we have a biased coin and remember if we have a biased coin we could

Â say the probability of a head or success probability is p and we flip it n times.

Â What is the joint density of the collection of possible outcomes?

Â Recall, each coin flip here is a Bernoulli random variable, with success probability

Â p. And recall we wrote out the density in the

Â form p to the x, one minus p to the one minus x, and notice that's a very easy

Â form, right? So if you plug in x equal one, you get the

Â probability p of a head or of a one. And if you plug in x equals zero, we get

Â one minus p for the probability of a tail or a zero.

Â So this density is a nice way to represent it and you'll see why we present it

Â specifically this way in the next line. So the joint density is the joint mass

Â function after f of X1 to XN. If their independent coin flips, right, is

Â simply the product of the individual densities and you'll see, from this

Â formula, we get p raised to the summation Xi one minus p raised to the n summation

Â Xi. So if the xs are all 0s and 1s, this works

Â out to be p to the number of heads, one minus p to the number of tails.

Â And that's basically why we write out the density this way is because if we have a

Â bunch of independent coin flips, then it's convenient that the mass functions

Â multiply and we wind up with this nice form for the joint mass function.

Â >> So if you wanted to say for example, if I have a bias point, the success

Â probability P, and I had four coin flips and I wanted to know what is the

Â probability of getting a one and then a zero and then a one and then another one.

Â So one, zero, one, one you would simply plug into this formula and notice the

Â order of one, zero, there we got three heads and one tail.

Â Notice the order doesn't matter. We would need p to the three, one minus p

Â to the one would be the probability of that occurrence.

Â And notice it's the same probability regardless of what order.

Â The 1s and 0s occurred. So this formula makes it easy to calculate

Â the joint probability for a collection of 1s and 0s from a potentially biased coin

Â flip. Just want to mention again that this model

Â is tremendously important. So, imagine for example we want to model

Â the prevalence of hypertension in a population.

Â One way we might go about doing that, is to say that our sample is IID and again

Â that's often a big assumption, that people are IID draws, individuals are coin flips

Â and what we would like to know is their success probability of having

Â hypertension. And so that success probability is the

Â prevalence of hypertension in the population and we would use this joint

Â mass function to model that process for our collection of data and that's the idea

Â behind where were going with this. But notice there's a lot of assumptions

Â that go into that right? I just want to emphasize this fact quite a

Â bit. We're assuming that we're randomly drawing

Â people from the population that we're interested in or not even that we're

Â randomly drawing them but that we can model the collection of people.

Â Their hypertension status as if they were a bunch of independent coin flips with the

Â prevalence being the success probably. That's ultimately what our model is

Â stating. So it's important to always keep that in

Â mind. So let's stop here and we'll next talk

Â about some of the mathematical properties associated with random variables and

Â covariances in correlation and their consequences when variables are

Â