Now we will be moving on to Section C of the design lecture, and in this section, we're going to talk about testing for hypotheses other than superiority. Frequently, when we think of the hypothesis of interest in a clinical trial, we think of the superiority hypothesis. Is treatment A better than treatment B? Or is treatment B better than treatment A? In this section we'll be talking about designs where we're testing for equivalency or non-inferiority instead of superiority. These are designs that can be used to compare a new intervention to an established intervention. When we use one of these designs we might think that treatment A is as good as or the same as treatment B for treating or preventing a specific condition. But we believe that the use of treatment A might have some other kind of benefit such as less severe adverse events, or treatment A might be easier to administer than treatment B, or treatment A might be cheaper than treatment B. Another use of these designs is to do head-to-head comparisons to two or more established treatments for a specific condition. This uses has been discussed recently quite a bit with respect to comparative effectiveness research. First, I'm going to introduce the equivalence design. In the equivalence design, the objective is to show that the intervention response falls sufficiently close to the control response. That is, we are trying to demonstrate the equivalence of the two treatments. We could never show that the two treatments are exactly equivalent, because that would require an infinite sample size. So with the equivalence design, an important question that we have to address very early on in the design process is, how large can the difference be between two treatments, for the treatments to be considered equivalent? Usually we want that detective difference to be extremely small. We want to say that the difference in between the treatments is within a certain small margin in order to call the two treatments equivalent. If the difference that we observe is larger than the margin that we've set, we would say that these two treatments are not equivalent. In equivalence design, we also want to make sure that we have a high probability of detecting a difference if it's larger than the small margin that we've defined. So for both of these reasons, to rule out large differences and to have a large probability of detecting a difference should it exist, we need a large sample size for equivalence designs. So as with the superiority design, the comparison that we want to make in an equivalence design is between a null and an alternative hypothesis. However, for an equivalence design, we flip the way we define these two hypotheses. That is where for superiority design we are used to saying that the null hypothesis is that there is no difference between the treatments. For an equivalence design, we say that the null hypothesis is that there is a difference between the two treatments. And our alternative hypothesis for the equivalence design is that there's no difference between the two treatments. So then since we flipped our null and alternative hypotheses. We are also essentially flipping our Type I and Type II errors. So that for an equivalence design, the Type I error is to show no difference when there is one. And the Type II error is to show a difference when there isn't one. I pulled this example of an equivalence design from PubMed. In this study, which was coordinated by the Jaeb center in Tampa, the objective was to compare two treatments for the treatment of moderate amblyopia in children ages 7 to 12 years old. The two treatments were weakened atropine or patching of the sound eye for two hours a day. The investigators in this study had previously conducted another trial where they tested the combination of patching and atropine and they found that this combination was effective in treating children with amblyopia. But even after the trial, most health care providers still did not initiate a combination therapy for children with ambyopia. So, the investigators decided to test if the two therapies were equivalent to one another when used individually. The children in the study were seen for follow-up visits at 5 and 17 weeks following enrollment, and the primary outcome was visual acuity, controlling for baseline acuity. The study was designed to test the equivalence of patching and atropine. The equivalence limit was five letters or one line on the ETDRS chart. That is, the investigators felt that they should rule out a difference of more than one line on the ETDRS chart between the two groups in order to call the two treatments equivalent to one another. The last design that we're going to talk about in this section is the non-inferiority design. This is another example of testing a hypothesis other than superiority. In this case, the objective is to determine whether a new treatment is at least as good as an established treatment. To do this we test to see if the hypothesis that a new treatment is worse than the established treatment can be rejected. So our null hypothesis is that the new treatment is worse than the established treatment, and to reject this hypothesis, we need evidence to show that the new treatment is at least as good as the established treatment. You'll note that this type of statistical test is, by definition, one-sided. In other words, the observed estimates from which we would reject the null hypothesis, are located entirely in one tail of the probability distribution of the outcome. Operationally, we need to show that the new treatment's response, if worse, is still sufficiently close to the established treatment's response so that we are comfortable with saying that the new treatment is as good as, or not worse than the established treatment. Again, like with the equivalence design, we're looking for a very small detectable difference. But for the non-inferiority trial, the hypothesis is one-sided, whereas with the equivalence design, the hypothesis is two-sided. A one-sided test does not require as much evidence to reject the null, as a two-sided test at the same error level. Which means that a non-inferiority design does not require as large a sample size as the corresponding equivalence design. But you have to keep in mind that cost of using a one-sided test is that you're rejecting the null with a lower level of evidence. An example of a non-inferiority design is the advance to trial in which Apixaban was compared to Enoxaparin for the prevention of venous thromboembolism after total knee replacement surgery. Enoxaparin is a low molecular weight heparin, and is frequently used for the prevention of venous thromboembolism after major joint replacement. However, Enoxaparin increases the risk of bleeding, and it can be cumbersome to use. So the investigators proposed that Apixaban, which is an orally active factor XA inhibitor might be as effective in preventing venous thromboembolism. But may have a lower bleeding risk and it might also be to easier to administer the Enoxaparin. In advance two the patients were allocated to receive oral Apixaban twice a day starting 12 to 24 hours after surgery or subcutaneous injections of Enoxaparin starting 12 hours before surgery. Both treatment groups had placebos or shams. The treatments were continued for 10 to 14 days after surgery. The patients were assessed for the main outcome which was a composite of asymptomatic and symptomatic DBT, non-fatal pulmonary embolism, and all-cause death. Any of these events having an onset during treatment, or within two days of the last dose of treatment. The study was designed to test non-inferiority. The non-inferiority limit was set as the upper 95% confidence limit of the risk ratio of Apixaban versus Enoxaparin, not exceeding 1.25. And the risk difference of Apixaban minus Enoxaparin not exceeding 5.6% of the difference. So to reiterate, the goal of a non-inferiority trial is to demonstrate that the experimental treatment is not worse than the control treatment by more than a pre-specified small amount. This amount is non, is the non-inferiority margin. On this slide we're going to look at how non-inferiority margins are used with confidence intervals. And some of you are probably familiar with the concepts of point estimates and confidence intervals from your bio statistics class. But since not everyone has had bio statistics I'm just going to take a moment to review these concepts. A point estimate is a single value that estimates some population parameter based on our sample data. An example is the sample mean, which is the average of the values in our sample. And it's frequently used to estimate the unknown population mean. An interval estimate specifies a range within which the population parameter is estimated to lie based on the sample. How likely the interval is to contain the parameter is determined by the confidence level. And that's usually expressed as a percentage. The most commonly used confidence interval is the 95% confidence interval. For a 95% confidence interval, one can expect that if you sample repeatedly from the same population, 95% of the confidence intervals of the sample mean will contain the population mean of interest. In this figure, we have several confidence intervals, and these are indicated by the blue and red horizontal lines. The point estimates are designated with the short vertical lines that you see in the middle of the confidence intervals. These point estimates represent the sample estimate of the treatment difference between the experimental and the control groups. The solid black vertical line that runs from the top to the bottom, and ends above the zero, is the zero line. And point estimates that are close to the zero line indicate that our best estimate is that there is not much difference between the treatment effects in the two groups. Point estimates that fall to the left of the solid zero line, favor the experimental treatment, and point estimates that fall to the right of the zero line favor the controlled treatment. In this figure, in order to show superiority of the experimental treatment, we need the 95% confidence interval to fall entirely to the left of the zero line. You'll notice that there's another long vertical line that is dashed and has a delta at the top. This is our non-inferiority margin. If the confidence interval crosses or falls to the right of the non-inferiority margin, then we cannot reject the null that the experimental is worse than the control. However, if the 95% confidence interval falls entirely to the left of the delta line, we can reject the hypothesis that the experimental treatment is worse than or inferior to the control. So in this figure, the confidence intervals that are shaded in blue fall entirely to the left of the delta line. So in those cases we can say that we have shown non-inferiority. The confidence intervals shaded in red, cross the delta line so we cannot say that we've shown non-inferiority. Only the bottom confidence interval falls to the left of the non-inferiority line and also entirely to the left of the zero line. And so, in that scenario, we can say that we've shown non-inferiority, and we have also shown superiority. Trials are sometimes designed with nested non-inferiority and superiority hypotheses. Investigators might design the trial so that if non-inferiority is established when the study is finished, then they can go on and test for superiority. The more common situation is when investigators fail to show superiority, but they might then test if they can show non-inferiority. So they can't say that the experimental treatment is better than the control, but they can say that it's at least not inferior by some small amount. In the example of the non-inferiority trial from the previous slide, the advance two trial, the investigators had planned a priori to test for superiority once they had established non-inferiority. So this brings us to the end of the section on designs for hypothesis testing. We've covered superiority, equivalence and non-inferiority hypotheses and in the final section we'll cover adaptive designs.