Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

41 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so let's talk about exact inference odds ratios.

Â This is the last thing I'll talk about in this lecture.

Â let's let X be the, the number of smokers for the

Â cases, and Y be the number of smokers for the controls.

Â and remember in this case X and Y are

Â the random numbers because we're thinking of case reference sampling.

Â So X and Y are the, the, are the random numbers.

Â The 709 margins are fixed, and we're going to

Â assume that both of them are say binomial.

Â and we want to calculate an exact confidence interval for the odds ratio.

Â Not an approximate one, so the square root,

Â one over the cell counts formulas an approximate

Â one.

Â And I'll show you that you have to eliminate a

Â nuisance parameter, and I'll show you how to do that.

Â So let's define the logit function as the log of the odds.

Â So logit p is log p over 1 minus p, that's the logit function.

Â So notice the differences in the logits are log odds ratio, so if you logit

Â P1 minus logit P2, that's the log odds ratio for P1 to P2.

Â So as an example, logit P so let's, let's define

Â the, the, the logit of the probability of being a smoker given you're case as delta.

Â And, and by the way this implies that the probability

Â that you're a smoker given the case, given a case is

Â e to the delta over one plus e to the

Â delta, so if you invert the logit function you get that.

Â The logit of the probability of being a smoker given

Â that you're a control, let's call that delta plus theta.

Â So it's a different number for describing

Â it relative to delta but because we're not constraining

Â theta, it can still be any number but theta, okay?

Â then we get probability of being a smoker given a control works out to be

Â e to the delta plus theta divided by one plus e to the delta plus theta.

Â In this case theta works out to be the log-odds ratio.

Â Okay, so actually just of course it's the log-odds ratio because

Â if we subtracted to logits, the delta cancels out and we get theta.

Â So theta's the log-odds ratio.

Â And, and in the way that we've parameterized this,

Â delta, this other parameter is this so called nuisance parameter.

Â We don't care about that.

Â What we care about is the log-odds ratio comparing smokers to case status.

Â Okay, so let's keep working on the the model here.

Â So here we're going to

Â assume that x is binomial with n one trials, and then

Â we already stipulated that the probability that the logit is delta.

Â So the probability is either the delta over 1 plus

Â e to the delta, then y is binomial with n2

Â trials, and success probability e to the delta plus theta

Â divided by 1 plus e to the delta plus theta.

Â So then our probability x is, is the capital X takes on realized

Â value little x It's going to be this binomial probability and you can

Â kind of work with to get to where it's this formula right here.

Â N1 choose X, etcetera.

Â Then, so this is just carrying this over from the previous slide.

Â this is the property X takes on realized value little x.

Â And then I'm going to look at the probability

Â Y takes on realized value z minus x.

Â And you'll, you'll hopefully see why in a minute.

Â And that's just plugging directly into the binomial formula.

Â And again I have z minus x right

Â here, instead of a particular value say little y.

Â Okay, now,

Â the, the probability that X plus Y, the random variable X plus Y,

Â takes on realized value z is a little bit harder to calculate.

Â Because X and Y are not identically distributed.

Â If they were identically distributed, then it would, x would be the sum of a

Â bunch of Bernouli trials, y would be the sum of a bunch of Bernouli trials.

Â So x plus y would be the sum of a bunch of

Â [UNKNOWN]

Â Bernouli trials.

Â So they're still both the sum of a

Â bunch of Bernouli trials, but not the same, okay.

Â So here's what we can do. We can factor this into, let, let's

Â suppose that we decompose z into u and z minus u,

Â u part's going into x, and z minus u part's going into y.

Â The net probability would be this, this product right here.

Â Probability X is u, and Y is z minus u.

Â And so the probability X plus Y takes on the value z, is going to be the sum over

Â all the possible values of u. in other words, all the different ways we

Â could allocate some of it, some of the some of the elements of z to x,

Â then what.

Â Whatever we can allocate the remaining to Y.

Â Okay so, so that's a quick little formula you can do.

Â Okay, now we're going to get to the point.

Â So now let's look at the probability of X takes on a particular value

Â x given that the sum X plus Y takes on a particular value z.

Â And I'm just going to plug in these three lines up here, right?

Â So the probability X takes

Â on value x is going to be the probability this, this numerator

Â probability right here, probability x equals x, probability y equals z minus x.

Â and so

Â just to elaborate on that point. the probability X takes on value x and X

Â plus Y takes on value z is this, the and

Â probability, then given, because we're stipulating that X is value x.

Â That's the same thing as the probability Y takes on the value z

Â minus little x, and then we can factor those probabilities into the product.

Â So that's the numerator right here,

Â and then the denominator I'm just plugging directly in

Â the probability X plus Y takes some value z.

Â Okay, so then you put it all in, just you, you know,

Â if, if you can follow the mathematics, hopefully you follow the mathematics.

Â if you can't, if you're having trouble with this,

Â because I realize it's a little bit in depth.

Â then if you plug it all in, and you wind up with this formula right here.

Â And, and again this,

Â you know, this is very similar to our development of Fisher's exact test.

Â Only difference is now, we haven't assumed the null hypothesis to be true.

Â And so what we have here is, is the, it depends on theta, this log odds ratio.

Â Okay.

Â So here but, but notice it doesn't depend on delta, right?

Â So we've gotten rid of delta.

Â And that's this idea of conditioning away the nuisance parameter.

Â Here, conditioning on x plus y, it conditions away the nuisance parameter.

Â So, but nonetheless, now we have a distribution odd for our two variables.

Â Because remember if I know, I don't need to talk about the x and y if

Â I have conditioned on x plus y, if I know x,

Â then I know y, given that I know X plus Y.

Â Okay.

Â So

Â So, so the, the, you can use this distribution to calculate the

Â exact hypothesis test for theta equal to theta nought, other than 0.

Â The specific case 0 results in Fischer's

Â exact test, the ordinary hyper geometric distribution.

Â and then you could invert these tests to

Â yield exact confidence intervals for the odds ratio.

Â And that is exactly what R does if you do

Â fisher.test, it'll give you a confidence interval for the odds ratio.

Â It is

Â exactly doing this

Â this procedure right here.

Â It's inverting the so-called dis-distribution here,

Â which is called the non-central hypergeometric distribution.

Â and it, so we're not going to go through any calculations with this,

Â because as you can tell, at this point, it's gotten rather involved.

Â But I did just want to show everyone

Â where these exact odds ratio calculations come from.

Â They basically come form this formulation of the problem as a non-central

Â hypergeometric distribution.

Â So what I'm hoping you got from today's

Â lecture though was a little bit of information about

Â the odds ratio, about some of its more

Â general purpose uses, for example, in case control studies.

Â And then also now to talk about a little bit about where some

Â of the more complex formulas for performing

Â inference on the odds ratio come from.

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.