Hi, this video is about how to analyze your data, your outcome data after matching. So we'll assume that you've already matched your data, you've checked for balance. And your happy with that and now you're ready to analyze your outcome data. So will identify appropriate statistical methods for analyzing match data. And will specially talk about methodization test, what they are and why they might be useful. Again we'll imagine that we successfully matched, so we've checked for balance and we are ready to proceed with the outcome analysis. And so there's a few things that you might want to do. And so, one is to test for a treatment effect. So you might be interested in hypothesis testing. Just answering the question, is there a treatment effect? You're likely also interested in estimating the treatment effect, and a confidence interval. And whatever method you use should take matching into account. So we'll begin with Randomization tests, and these are also known as permutation tests or exact tests. And these are the kinds of tests that you could use if you actually had randomized data, data from a randomized trial. But since we've already matched, we're going to basically, at this point, treat the data as if it was from a randomized trial. And so the main idea here would be to first choose a test statistic. And then to compute that test statistic from your observed data. Next you will assume that the null hypothesis is true, meaning that there's no treatment effect. Then we'll just permute the treatment assignment randomly. So within each pair and then we'll re-compute the test statistic. We can do this over and over again. So we can keep permuting the treatment assignment, calculate and test statistic, and recording that, do it over and over again. Now, what we'll do is we'll see Is the test statistic that you observed unusual, relative to these permuted ones, which were calculated under the assumption that the data we're actually randomized, that the treatment assignment was randomized. So if the no hypothesis is true, your observed test statistic should be pretty consistent with the ones that you see from permuting the data in this one. So we'll consider an example here to make the ideas more clear. So suppose we have a binary outcome, so just a yes or no kind of outcome, and 13 matched pairs. So we'll just consider 13 matched pairs so with that we can see it and it'll be easier to understand. And so what we'll choose us a test statistic here is just the number of events in the treated group. So we'll just count how many events there were, meaning how many times the outcome occurred in the treatment group. You could pick a different test statistic and you should essentially come to the same conclusion. But we'll just pick a count of the number of events in the treatment group. So here are some hypothetical data of 13 matched pairs. And what we have here, is we have on the far left here, Matched pair. This is just, you could think of this as just an identification variables saying which matched pair it is. So, matched pair 1 you think of that as first match pair and in that case, and then what we're showing in the numbers in the table are the actual outcomes. So the first matched pair, the person in the treated group did not have the outcome, so their outcome was equal to 0. In the control group, they also had an outcome of 0. So these individuals were matched based on their covariance. And then we looked at their outcomes, and their outcomes happened to be the same, and they happened to both be 0. So that's a concordant pair, they agree with each other. You'll see matched pair 2, the second matched pair, the treated subject had the outcome, where as the control did not. So the treated had the outcome with the value of 1, and for the control, the value was 0. So that's a discordant pair. Now our test statistic, we said, was a count of the number of events in the treatment group. So all we would do is we would look in the treatment group, and count the number of times you see a 1. So we'll just add up those, and if this was done correctly, we should get a 6. So if we should see that there's 6 1s in the treatment group. So that's our test statistic from the observed data. And what we want to know is, is that an unusual number? And what we mean by, is that an unusual number, is if the null hypothesis was actually true, where treatment didn't matter at all. So treatment doesn't do anything. Then would we see a test statistic like 6? Or is it unusual? So the way we're going to tell whether 6 is an unusual number or not, is by randomly permuting the values in each row. So we're going to actually randomly permute these outcome values in each row, and I'll show you an example of that. So, what I first did was I highlighted, or colored in red, of rows where there's discordant pairs. And this is important because in the concordant pairs, permuting isn't going to do anything, right? So for example, matched pair 1, there are always going to be a 0-0 pair whether I change the label of who is treated and who is controlled or not, that's always going to be a 0-0 group. So there is nothing interesting happening there, so all of the information about a treatment effect should come from discordant pairs. And you'll see in this case, there's, the ones that are in red, there's 7 of them, those are the discordant pairs. And what we are going to do is just randomly shuffle those 0s and 1s. And what you could think of that is doing is just randomly relabeling who is a treatment and who is a control. And again, the reason we're doing this is that under the null hypothesis it shouldn't matter who's the treatment and who's the control, because treatment isn't doing anything. Treatment has no impact. So, I should be able to just randomly permute these labels. And so, you could do that just by randomly drawing from a binomial distribution with probability 0.5. And so I'll show you an example. Here's permutation 1 and if we're going to slide back and forth, you'll see that couple of these changed. So here's the original data, here's a permuted data. So I did this at random. So basically just to make extra clear what we're doing here, let's take this one for example. Matched pair 13, you could think of it as, I'm going to flip a coin, and if heads comes up, I keep them the same, if tails come up, I change, where now treatment is a 0 and control is a 1. So there's a 50% chance I'm going to flip, and there's a 50% chance I'll stay the same. And so I did that, and now a couple of them have changed And what I do now is I recalculate the test statistic, so I count how many 1s appear in the treated column and then ends up being a 5 here. And now I'll just keep doing that, here's the second permutation. This one I get a test statistic of 7. I permute again, I get a test statistic of 3. And I'll just keep doing this, and recording the test statistic. Now, you could do this thousands of times and then see whether the value that you observed is unusual, relative to all these permuted ones. So if we carry out the permutations a thousand times, and record the test statistic each time, then you could plot the distribution of the test statistic. And that's what we've done here. So what you'll see is that we have t is the test statistic. So that's just the count of the number of times the event occurred. And so we see here this is an actual count. And there was 1,000 total times that I did this. And you'll see that, for example, a value of one is quite unusual, it probably only occurred about ten times out of a thousand, where as values of four and five are quite common. So this is again, we are permuting under the assumption that there is no treatment effect, we record the test statistic each time and then this is the disruption of that statistic. And what we want to know is then, is the test statistic that we observed unusual? And so here's the one we observed, was a value of six. So our observed statistic was a six, and you'll see that that is quite consistent with what we would see if there was no actual treatment effect. It's kind of right towards the middle of that distribution, so what we've observed is not inconsistent with the [INAUDIBLE] hypothesis of no treatment effect and therefore we really don't have evidence that, in this case, that treatment is really doing anything. If you wanted to calculate a P value, The intuition here is what we rarely do is calculate the probability of something more extreme than our particular value or as extreme or more extreme than our particular value. So we would in this case, what that would mean is, we would add up the probability of being a six, a 7 or an 8, that's as extreme or more extreme than our value. But because this is symmetric we would also need to add up the probability of being a 1, a 2, or a 3. Those are equally unusual in the sense of remember we are just counting number of events. But we also could have just counted as the number of non events. So either of those are the same kind of statistic. So in other words, having a test statistic of a one, is equally unusual to have a test statistics of an a. So we want to count both tails. So if you were going to actually get a P value, you could just add up the probabilities of being 1, 2, 3, 6, 7 or 8. And now we tell you how likely it was to have statistic as unusual or more unusual than others. Another way you can get that, is our P values actually uses McNemar test which is A test that's often used for paired binomial data. And here, I created the kind of table that you would typically use for this type of test where what we have is a count of treatment and control pairs based on the outcome. So, here we have a treatment group, we have the control group And then we have the outcome. And so for example, this first cell here, it's a count of how many pairs were there where both the treated and untreated subject had a value of one. And remember from our original data, there was one person, one pair where the value was one for both of them, so that's where that count is coming from. The other concordant pair is when both individuals have a zero for the outcome. There were five of those in our original data. Then there's discordant pairs so our discordant pairs are here. This is when the treatment group had the outcome but the control group didn't, there were 5 cases like that. And then there were 2 cases where the treatment group did not have the outcome, but the control group did. And so, really all of the information about, whether there's a treatment effect comes in those discordant pairs, in effect comparing that five versus the two. So, if there's no treatment effect, you would expect those numbers to be about the same. In this off-diagonal here, we would expect them to be about the same. But if they're very different, then there would be evidence of a treatment effect. And so, you can actually test that in r using the following commands. So the first thing I do is, I just create this matrix. So I have a matrix here, a 2x2 matrix. And I can enter that right into r as follows. So I just use the matrix command. I enter the numbers 1, 5, 2, and 5. I tell it it's a 2x2 matrix. And I name it, ex. So I just happened to name my matrix ex. Now I can just use a mcnemar.test package, and I tell it my matrix is this. And the output, then, is displayed below. And what we see is that we end up with a p-value here. About 0.45. So 0.45, it's very far from 0, so it's saying that we really don't have evidence of a treatment effect. And in fact that 0.45 that we got from the McNemar test would be roughly the same as if you added up these. 6, regions here and then divided by a 1,000. So they all end up with the same result. So this is one way you can analyze or test a null hypothesis using a randomization test. If you have matched data. This is appropriate for matched sort of paired binomial data and you could just use this test if you wanted strictly to test the null hypothesis. So randomization test, so it's really a general idea. So you don't have to just have binomial data. You could ask for like continuous data. And as a hypothetical example, let's imagine that our outcome is systolic blood pressure, and we have that from 20 treatment-control pairs. And so a test statistic that you might want to use, then, is just a difference in sample means. So the average systolic blood pressure in the treatment group, minus the average systolic blood pressure in the control group. So here is some hypothetical data where we have pairs of observations, so these are paired subjects who are matched and co variant, so we have the control here, treated here and we've the pair ID, so for example, this is the first pair. Our outcome is systolic blood pressure. So what you see in the actual body of the table is the systolic blood pressure. So for example, in the first match pair, the control had a systolic blood pressure of 106, and the treated patient had a systolic blood pressure of 101. I've said that our, a test that we might want to use is the average systolic blood pressure in the treated group, minus the systolic blood pressure in the control group. So we could, we would actually just stratify the treatment group, calculate a sample mean for each, and then take the difference. And if you do that, you get a difference of 8.2, it's actually an negative 8.2 if you take that difference. So, that's what I'll call T. So we have twenty individuals, the difference in means is about eight and what I want to know is that unusual, is that the kind of thing you would see if the hypothesis is true. So what we can do is the same idea here, where we can just randomly the labels, and recalculate the test statistic and then just keep doing that. So what we're going to do is for each of these pairs, we're going to flip a coin and to decide which one's a control and which one's a treatment. So we're just going to reassign the label on whether they are treated or controlled. There still will be one treated and one controlled in the spare but we're going to randomized which is which. Again the idea is that, if treatment doesnt matter at all these are just arbitrary labels and we should be able just randomly assign them. So, that's exactly what I did here. I did this a 1,000 times, and this is a distribution of the test statistics. So, T is the difference in sample means in systolic blood pressure between treated and controlled. And here is a frequency of how many times that occurred out of a 1,000 times. And what you'll see is that, typically, this is centered around 0, which is basically what you would expect if there's no treatment effect. So it's centered around 0, and then we see how far the tail sort of stretches. So if there was actually no treatment effect it might not be unusual to see a value sort about here around four or five wouldn't be extremely unusual or just to see one or negative three or negative four. But our value was around -8, which is what this vertical bar is. So in this case what we see is that our value was actually quite unusual. And in fact the P-values about 0 here, you'll notice that all these [INAUDIBLE] unity labels a1,000 times, and it kept recalculating the test statistic, and never did I get a value this extreme. Never did I get a value as small as -8. So the P-value's about 0 there. The P-value's the proportion of times that you would see something as extreme or more extreme. And we really never saw anything that extreme. So we would reject our hypothesis, we would reject an null hypothesis that there's no treatment effect. So now, in this case, we have evidence of a treatment effect. Another way you could do this is with what's known as a paired t-test. So, hopefully you're familiar with this idea where, if you have continuous data but it's matched pairs you can use a with known as a pair t-test this is our code to do that so we say to test and I let it know that it's paired. And what it shows is that the mean differences that we observe our point estimate is a -8.2 a 95% confidence interval is here. So it goes from -10.7 to -5.7. And we have an extremely tie p-value. So we have strong evidence that there is a treatment effect, in the direction of a negative effect. So other outcome models are also possible. Besides the kinds of outcome the analysis methods that I just talked about, such as paired t-test and permutation test, some of the kinds of models you're more familiar with could also be used. For example, you could conditional logistic regression which is appropriate for matched binary outcome data. You could use a stratified Cox model if you have time-to-event or survival type of data. So this is exactly like the usual Cox model, except it would stratify the baseline hazard function on matched sets. So you would have in your dataset a matched ID variable. And you would let the software know to stratify on that Match ID variable. You could also use generalized estimating equations or GEE. And again you would tell the software to cluster on Match ID. This is especially useful for binary outcomes. Because you could use GEE to estimate a causal risk difference, a causal risk ratio, or a causal odds ratio. And which one of those you want would sort of determine which link function you use. So if you want a causal risk difference, you would use an identity link if you wanted a causal risk ratio you could use a log link.