Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

41 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so let's go through our example.

Â Our density estimate is, I don't have a density.

Â Our, our d estimate of the difference in

Â the marginal proportions works out to be 0.04.

Â I you know, you can plug in to the sigma hat d squared formula here.

Â It works out to be about 0.0 0.95. and for the sigma,

Â not sigma d squared and then the confidence interval you can

Â do right here. It's plus or minus the standard error the

Â difference plus or minus two standard errors we get about 0.06 to 0.02.

Â And notice what happens though, if you ignore the dependence.

Â And just chop off this co-variance term here and

Â forget about it.

Â Then you wind up with a much significantly inflated a standard error.

Â Sigma d works out to be about 0.0175.

Â So there is kind of an interesting, relationship between the

Â Cochran-Mantel-Haenszel test and the match 2 by 2 table.

Â So imagine if you took each pair and represented their time, first

Â and second, and gave their respon-, responses, yes or no.

Â and so we can really think of this as an extremely stratified setting, right?

Â Where every strata just has the two measurements first, second.

Â and, and

Â if you, if you do that then there's only

Â four possible tables, time, first, second, response, yes, no.

Â We got a 1 1 0 0, 1 0 0 1, 0 1 1 0, and then 0 0 and a 1 1 like that.

Â Okay?

Â So imagine if you represented the table like represented all the tables like

Â this, and I hope you can agree with me. That this would exactly reproduce the

Â two by two table.

Â If you knew all these tables, you would exactly reproduce the two by two table.

Â So here's a, kind of a famous old result.

Â That the McNemar's test is equivalent to the Cochranâ€“Mantelâ€“Haenszel test.

Â Where the subject is the stratifying variable, and each 2 by 2

Â table is the observed 2 by 2 table from the previous slide.

Â So you could almost think it's And again, I put

Â here that this representation is only interesting for conceptual purposes.

Â but I think you get, it is interesting to note that

Â you can really view the subject, or if you're doing matched pairs.

Â You really think of that as an incredibly stratified circumstance where

Â there's, you know, only two you know, two counts per table.

Â Then, and, and analyze the data that way with the Cochranâ€“Mantelâ€“Haenszel test.

Â You wind up with exactly the same test as McNemar's test.

Â It's just kind of

Â a conceptually neat idea that, you know, is

Â I don't know, kind of a fun little fact.

Â Another fun little fact is that McNemar's test has an exact version.

Â And so consider the, only the off-diagonal cells, the dis-coordinated cells.

Â And then under the null hypothesis.

Â pi 1 2 over pi 1 2 plus pi 2 1 is 0.5, right?

Â just look back at the null hypoethesis. If the two are equal,

Â then 1 over the sum would be 0.5. Okay.

Â then it turns out, also, under H 0, that n 2

Â 1, given the sum, right? So n 2 1 or n 1 2 given the

Â sum, is binomial with success probability 0.5 and n 1 plus

Â n 2 trials.

Â so you can exactly use this to come up

Â with an exact p value for, for matched pairs data.

Â basically, what we're doing is saying, under the null hypothesis, whether,

Â that the, the, the two off-diagonal probabilities are identical.

Â Whether you landed in the upper right hand cell or the

Â lower left hand cell is a coin flipped, for every matched pair.

Â And we would have evidence against the null, if a

Â lot more wind up in one of those two cells.

Â Okay, and so this is an example of, it's, it's actually a highly related

Â to the, and we'll cover this as well, the so-called non-parametric sine test.

Â And what, what you're saying is kind of under the null hypothesis.

Â you know, things should be exchangeable, whether they agree

Â in terms of approving and disapproving or disapproving and approving.

Â related to our approval example.

Â and so let's go, let's actually work out an example.

Â Okay, so here we want to test

Â H 0 that pi 2 1 equals pi 1 2 versus H a, pi 2 1 less than pi 1 2.

Â And I put in parentheses that this is pi 1 plus, less than pi plus 1.

Â pi plus 1 is less than pi 1 plus. So pi plus 1, is the approval at time

Â 2 and pi 1 plus is the approval at time 1 disregarding time 2.

Â Okay.

Â So this is testing whether or not the approval at

Â time 2 is lower than the approval at time 1.

Â Okay, so that's the, the direction that the margin is looking at.

Â So we saw 86 people in,

Â that disapproved on the first sur, survey.

Â And approved on the second survey, the n 2 1 cell.

Â And, we want to test whether or not that's

Â smaller than what, what would we expected by chance.

Â And the probability of getting data as or more extreme in favor

Â of the alternatives, so probably X is less than or equal to 86.

Â And then

Â because we're doing the exact version, we'll condition

Â on the total's sample size, the 86 plus 150.

Â The number of off-diagonal counts and, will use

Â a binomial with a success probability of 0.5.

Â And that probability is about 0.

Â So we reject the null hypothesis.

Â This, for a two sided test, just double the smaller the one sided test.

Â For the purposes of this class you know if you do it in R it'll, it'll maybe

Â do a slightly better procedure. Given that.

Â Okay I want to cover another thing that,

Â that's often omitted when discussing these things.

Â the marginal odds ratio would be the odds of approval at the

Â first. com-, I'm sorry, the, the comparison

Â of the odds approval at time 1 relative to the odds of approval at time 2.

Â So here I put time 1 in the numerator of the

Â odds ratio, and time 2 in the denominator of the odds ratio.

Â So what I have are the,

Â the odds of approval at time 1 at the top, versus the

Â odds, divided by the odds of approval at time 2 in the denominator.

Â So that is a margin, it's a marginal

Â odds ratio, because these are all marginal probabilities.

Â Right.

Â And that is of interest in exactly the same way

Â the, the difference in the marginal probabilities is, is of interest.

Â so but it's a different setting, right?

Â It's a different setting than if we just sampled some people

Â at time 1 and a different set of people at time 2.

Â And we could assume they're independent.

Â These are exactly the same two people sampled

Â twice, so we need to To account for that.

Â At any rate just like the ordinary odds ratio,

Â the way that we conduct the odds ratio confidence interval.

Â Marginal odds ratio confidence interval, was first we, we first

Â calculate directly the marginal log odds ratio.

Â It's given by theta hat here.

Â And then the stand the, the variance of that estimate or

Â hence the, you square root it to get the standard error.

Â Is given by this guy right here, where you put

Â hats over everything and estimate them with the relevant sample proportions.

Â In order to get the estimated standard error.

Â And so, you can use that to create a confidence interval

Â for the marginal log odds ratio, when you have matched paired

Â data, matched 2 by 2 data.

Â Okay.

Â So in the approval rating the marginal odds ratio compared to the odds

Â of approval at time 1 to the odds of approval at time 2.

Â The log odds ratio works out to be 0.16. The standard error works out to be 0.039.

Â And then the constant interval for the log odds

Â ratio then will be 0.16 plus two standard errors.

Â It gives you this right here, about 0.084 to 0.236.

Â You want to compare these to

Â 0, because it's all in the log scale.

Â And then exponentiated if you want the confidence interval for

Â the marginal odds ratio rather than the marginal log odds ratio.

Â Okay, I want to do cover something that always

Â comes up when I teach this class in person.

Â because several people will have seen a different formula

Â for the odds ratio for 2 by 2 tables.

Â And so I want to cover the one that they see.

Â And there's

Â a difference.

Â One of them is a conditional odds ratio, and the other's a marginal odds ratio.

Â So imagine if we created a logit model for our approval rating data.

Â Where we say the logit, the probability that person I

Â says yes at time 1 is alpha plus U i.

Â And a logit of person, the probability that person I says

Â yes at time 2, is alpha plus gamma plus U i.

Â So U i is this person specific effect.

Â Alpha is common across both times. And gamma is the log odds ratio comparing

Â the approval rating for given person at time 2 to time 1, right?

Â So notice you have to compare the same person,

Â because otherwise these U i's would not cancel out.

Â When you took the difference in these two logits.

Â So each U i contains a person-specific effect.

Â So the person with large

Â U i is likely to answer yes at both occasions.

Â A person with small or negative U i is likely to answer no at both occasions.

Â So then gamma here is the log odds ratio of comparing a yes at time 1 to a response

Â of yes at time 2 And in this case, gamma is a subject specific effect.

Â you, you only interpret gamma if in fact these U i's cancel out.

Â And that's where you

Â get the conditional this so called conditional formula for the odds ratio.

Â So one way to eliminate U i is to do a so called conditional estimate, estimator.

Â And the condition on the total number of yes responses for each person.

Â so, so what you wind up with is only looking at the discordant cells again.

Â And then the conditional ML estimator for this log odds ratio and its standard

Â error turn out to be the log of the ratio of the off-diagonal counts.

Â And the standard error turns out to be

Â the square root of 1 over the off-diagonal counts.

Â So I think people prefer this because it's a

Â simpler formula, but notice it has a very different interpretation.

Â In one case we were comparing

Â the marginal probabilities.

Â In the other case we had this formulation where we had these person

Â specific, random effects that had to cancel out, in other words one of them.

Â averaged across people, and then the other one conditioned on people.

Â So they have different interpretations, one is called

Â the marginal one is called a marginal odds ratio.

Â And it's confidence interval, and this one is called subject specific odds ratio.

Â And its confidence interval. So they have different interpretations.

Â The difference in interpretation is extremely subtle.

Â But it still exists. and that's why you get different answers.

Â so let me just summarize here. The marginal ML has

Â a marginal interpretation. And the, the effect

Â is averaged over all these U i values, if you want to put it back to the same model.

Â Okay.

Â The conditional ML estimate has a subject-specific interpretation.

Â And so, you know, if you ask me when would

Â you want to use one versus the other?

Â I kind of think if you were talking about kind of policy type questions.

Â Then you would want marginal statements.

Â and then if you want kind of clinical

Â type questions, then you probably want subject-specific type statements.

Â But it's, it's, you know, it's not perfectly clear.

Â but nonetheless, that's where the difference come from.

Â It's, it's the fact that basically the, the logit

Â is not a linear function.

Â And so, you get a difference between sort of averaging over people and then.

Â creating odds ratios, or creating odds ratios then averaging odds ratios.

Â You just get different answers.

Â And so that's the difference between those two.

Â I think it's a, it's a very subtle thing, and I

Â think for the purposes of this class, you can ignore it.

Â I just wanted to present it in case you were among

Â the subset of people, that happen to see this formula, log

Â n 2 1 over n 1 2 plus the standard error.

Â That the reason that it's different is

Â because we're kind of taking a different approach.

Â And the reason I do the marginal approach,

Â especially in this class, is because we talked

Â about everything related to 2 by 2, match

Â 2 by 2 tables that we discuss, is marginal.

Â So we talk about McNemar's test, the exact version of McNemar's test.

Â And then

Â the marginal odds ratio.

Â Everything is related to the marginal probability.

Â So if you're okay with that, then just leave it.

Â but if you're not okay with that.

Â And you need to know, why is this different from the formula that

Â you saw before, perhaps in an EPI class or in another bio-stat class.

Â That's the reason it's a different formulation

Â [SOUND].

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.