Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

41 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so just expanding on these points,

Â matched binary data can arise from several circumstances,

Â for example, when measuring responses to occasions

Â or matching on k status in retrospective study.

Â matching on exposure status in a

Â prospective study or a cross-sectional study.

Â and all these cases, matching in general right?

Â Matching general induces a dependency, and that

Â has to be accounted for in the analysis.

Â So the pairs on binary observations are, are independent.

Â I'm sorry, the pairs are dependent, in other words, your response at time

Â one is correlated with your response at time two.

Â So our existing methods don't apply.

Â However, we assume that, that, you know.

Â Person one who responded at time one and time two, is

Â independent of person two who responded at time one and time two.

Â So there's, we're assuming, independence.

Â Across pairs, the dependence within pairs.

Â Okay, so let's look at some notation. So here we're going to use our standard

Â contingency table notation, where we have n11, n12, n21, n22 for the four cells.

Â And then we have n plus 1, n plus 2, n1 plus, n2 plus.

Â and so here's our data, the n's, and we're going to assume

Â that the, the four cell counts, n11, n12, n21, n22, are multinomial.

Â With n, which is the sum of them, trials.

Â And then the associated probabilities is conveniently labeled pi 11,

Â pi 12, pi 21, and pi 22. So in other words we're going

Â to assume that every pair of measurements, every time 1,

Â time 2 collection pair of measurements, is going to be a one or a

Â zero in exactly one, one in one of these four locations.

Â So the, the person will have either said yes at both occasions, a yes and

Â then a no, a no and then a yes, or a no and then a no.

Â So they're going to only be a one in each one of those occasions.

Â And the probability of, of being the probability of

Â being a one in that particular cell is pi IJ.

Â Okay, and then the multinomial is just

Â the sum of all of these Multivariate Bernoulli.

Â Okay.

Â And then we would denote the margins with plus, n1 plus for the

Â row margin, pi1 plus for the row margin of the probabilities and so on.

Â And so pi1 plus and pi plus 1 are the marginal probabilities of

Â a yes response that the two occasions

Â disregarding the other occasion, so pi1 plus.

Â is the probability of saying yes at Time 1 regardless

Â of whether or not you said yes at Time 2.

Â And Pi plus 1 is the probability of saying yes at Time 2 regardless of whether or not

Â you said yes at Time 1. Okay?

Â So marginal homogeneity is the hypothesis that

Â these two marginal probabilities are the same.

Â That's how it gets its idea, marginal homogeneity pi 1 plus equals pi plus 1.

Â And of course because there's only two

Â probabilities right?

Â If pi 1 plus equals pi plus 1, pi 2 plus equals pi plus 2.

Â So the marginal probabilities are the same and so we call it marginal homogeneity.

Â you can do a very quick calculation right?

Â Pi 1 plus is pi 1 1 plus pi 1 2.

Â Pi plus 1 is pi 11 plus pi 21, right, and the pi 11 is common in both of those.

Â If you subtract them out, this hypothesis is

Â identical to pi 12 equal to pi 21.

Â Okay, and so that's, that hypothesis is referred to as symmetry,

Â because it is the off-diagonal elements of the table and

Â it's basically saying that the true probability matrix, the true probability

Â two by two table would satisfy being identical under the

Â transpose if you were to transpose the table.

Â and so that property is called symmetry and

Â hence this marginal homogeneity hypothesis is equivalent to symmetry,

Â only in the case of a two by two table, and in more general cases, it's not true.

Â So the we, we clearly have an estimate for all of the pis, so pi

Â 12 estimate is just the n 12 divided by n, pi 21 estimate is

Â n21 divided by n and so on. simply the proportion.

Â The, the estimates of the true probabilities of landing in each

Â cell would be the proportion of people who landed in each cell.

Â So the obvious estimate of, of the

Â difference between p12 and p21, are ie. How far away from symmetry are, you are.

Â Or in other words, how far away from marginal homogeneity

Â you are is just n12 over n21 minus 1 over n.

Â And it turns out and this is maybe a

Â little bit involved for us to go through, but under

Â H not as consistent estimate of the variance turns

Â out to be n12 plus n21 divided by n squared.

Â and so if you were to take this numerator

Â and one, and take this as our statistic, n12 over n minus n 21 over n,

Â and divide it by the standard error, so square root n12 plus

Â n21 divided by the square root of n12 plus n21 divided by n.

Â Divide those two, you would get a so-called z statistic.

Â The preference the preference in this case is typically to square that statistic.

Â I think that matches the traditional development.

Â And so the square of that statistic works out to have

Â this convenient form, n12 minus n21 squared, over n12 plus n21.

Â And this follows a chi squared distribution because of course the Z

Â statistic, squared follows a chi squared distribution with one degree of freedom.

Â So this is the famous McNemar's test statistic.

Â And you were jerked marginal homogeneity if this test statistic is large.

Â So this test is called McNemar's test. And

Â notice what's interesting about McNemar's test is that only n12

Â and n21 are used. They, they're the only ones that carry the

Â relevant information about pi 1 plus, and pi plus 1 being different.

Â now, n11 and n22, the concordance cells, contribute to

Â the magnitude of this difference, but, in testing whether or not

Â they differ, it's only the discordant cells, n12 and

Â n21, where people disagreed from time 1 to time 2.

Â so that's an interesting fact about this test.

Â It's called McNemar's test and it's a, you know, it's a very famous statistic.

Â Okay, so let's look at our test statistic from the approval rating example.

Â We have 86 and 150 as the off diagonal cell,

Â so that's 86 minus 150 quantity squared over 86 plus 150.

Â That works out to be 17.36.

Â The P value is extremely small then, right?

Â Because right, if chi squared was one degree of freedom,

Â it's going to be unlikely to be, extremely unlikely to be

Â above 9, 3 squared.

Â with three as a, as a way out on the tail of the standard normal.

Â hence we reject the null hypothesis, and conclude that there appears

Â to be some sort of change in the opinion between the polls.

Â any rate in R you can just do

Â mcnemar.test, you have to give it a matrix.

Â And again this is one of these options, one of these instances where if you if

Â you want get exactly the statistic you work out by hand you

Â have to put correct equals false because it does a continuity correction.

Â by default you, you want, in general

Â to leave the continuity correction in, I'm just

Â putting it as false here so you get, it matches exactly your by hand calculations.

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.