0:01

Now we will be moving on to Section C of the design lecture, and

in this section, we're going to talk

about testing for hypotheses other than superiority.

Frequently, when we think of the hypothesis of interest

in a clinical trial, we think of the superiority hypothesis.

Is treatment A better than treatment B?

Or is treatment B better than treatment A?

In this section we'll be talking about designs

where we're testing for equivalency or non-inferiority instead

of superiority.

These are designs that can be used to

compare a new intervention to an established intervention.

When we use one of these designs we might think that treatment A is as

good as or the same as treatment

B for treating or preventing a specific condition.

But we believe that the use of treatment A might have some other kind of

benefit such as less severe adverse events,

or treatment A might be easier to administer

than treatment B, or treatment A might be cheaper than treatment B.

Another use of these designs is to do head-to-head comparisons

to two or more established treatments for a specific condition.

This uses has been discussed recently quite

a bit with respect to comparative effectiveness research.

1:11

First, I'm going to introduce the equivalence design.

In the equivalence design, the objective is to show

that the intervention response falls

sufficiently close to the control response.

That is, we are trying to demonstrate the equivalence of the two treatments.

We could never show that the two treatments are

exactly equivalent, because that would require an infinite sample size.

So with the equivalence design, an important

question that we have to address very early

on in the design process is, how large can the difference be between two treatments,

for the treatments to be considered equivalent?

Usually we want that detective difference to be extremely small.

We want to say that the difference in between the treatments is

within a certain small margin in order to call the two treatments equivalent.

If the difference that we observe is larger than the margin that

we've set, we would say that these two treatments are not equivalent.

In equivalence design, we also want to make sure

that we have a high probability of detecting a difference

if it's larger than the small margin that we've defined.

So for both of these reasons, to rule out large differences and to have a large

probability of detecting a difference should it exist,

we need a large sample size for equivalence designs.

2:21

So as with the superiority design, the comparison that we want to

make in an equivalence design is between a null and an alternative hypothesis.

However, for an equivalence design, we flip the way

we define these two hypotheses.

That is where for superiority design we are used to saying that

the null hypothesis is that there is no difference between the treatments.

For an equivalence design, we say that the null hypothesis

is that there is a difference between the two treatments.

And our alternative hypothesis for the equivalence design

is that there's no difference between the two treatments.

So then since we flipped our null and

alternative hypotheses.

We are also essentially flipping our Type I and Type II errors.

So that for an equivalence design, the Type I

error is to show no difference when there is one.

And the Type II error is to show a difference when there isn't one.

3:13

I pulled this example of an equivalence design from PubMed.

In this study, which was coordinated by the Jaeb center in Tampa,

the objective was to compare two treatments for the treatment of moderate

amblyopia in children ages 7 to 12 years old.

The two treatments were weakened atropine or patching

of the sound eye for two hours a day.

The investigators in this study had previously conducted

another trial where they tested the combination of

patching and atropine and they found that this

combination was effective in treating children with amblyopia.

But even after the trial, most health care providers still did not initiate

a combination therapy for children with ambyopia.

3:51

So, the investigators decided to test if the two

therapies were equivalent to one another when used individually.

The children in the study were seen for follow-up visits at 5 and 17

weeks following enrollment, and the primary outcome

was visual acuity, controlling for baseline acuity.

The study was designed to test the equivalence of patching and atropine.

The equivalence limit was five letters or one line on the ETDRS chart.

That is, the investigators felt that they should rule

out a difference of more than one line on

the ETDRS chart between the two groups in order

to call the two treatments equivalent to one another.

4:31

The last design that we're going to talk

about in this section is the non-inferiority design.

This is another example of testing a hypothesis other than superiority.

In this case, the objective is to determine whether a

new treatment is at least as good as an established treatment.

To do this we test to see if the hypothesis that

a new treatment is worse than the established treatment can be rejected.

So our null hypothesis is that the new treatment

is worse than the established treatment, and to reject this

hypothesis, we need evidence to show that the new

treatment is at least as good as the established treatment.

You'll note that this type of statistical test is, by definition, one-sided.

In other words, the observed estimates from which we would reject the null

hypothesis, are located entirely in one tail

of the probability distribution of the outcome.

Operationally, we need to show that the

new treatment's response, if worse, is still sufficiently

close to the established treatment's response so that

we are comfortable with saying that the new

treatment is as good as, or not worse than the established treatment.

Again, like with the equivalence design, we're

looking for a very small detectable difference.

But for the non-inferiority trial, the hypothesis is one-sided,

whereas with the equivalence design, the hypothesis is two-sided.

A one-sided test does not require as much evidence to reject

the null, as a two-sided test at the same error level.

Which means that a non-inferiority design does not require

as large a sample size as the corresponding equivalence design.

But you have to keep in mind that cost of using a one-sided

test is that you're rejecting the null with a lower level of evidence.

6:11

An example of a non-inferiority design is the advance to trial in which Apixaban

was compared to Enoxaparin for the prevention of venous thromboembolism after

total knee replacement surgery.

Enoxaparin is a low molecular weight heparin, and is frequently

used for the prevention of

venous thromboembolism after major joint replacement.

However, Enoxaparin increases the risk of bleeding, and it can be cumbersome to use.

So the investigators proposed that Apixaban, which is an orally active

factor XA inhibitor might be as effective in preventing venous thromboembolism.

But may have a lower bleeding risk and it

might also be to easier to administer the Enoxaparin.

In advance two the patients were allocated

to receive oral Apixaban twice a day starting

12 to 24 hours after surgery or subcutaneous

injections of Enoxaparin starting 12 hours before surgery.

Both treatment groups had placebos or shams.

The treatments

were continued for 10 to 14 days after surgery.

The patients were assessed for the main outcome which was a

composite of asymptomatic and symptomatic DBT,

non-fatal pulmonary embolism, and all-cause death.

Any of these events having an onset during treatment,

or within two days of the last dose of treatment.

The study was designed to test non-inferiority.

The non-inferiority limit was set as the upper

95% confidence limit of the risk ratio of Apixaban versus Enoxaparin,

not exceeding 1.25. And the risk difference of

Apixaban minus Enoxaparin not exceeding 5.6% of the difference.

So to reiterate, the goal of a

non-inferiority trial is to demonstrate that the

experimental treatment is not worse than the

control treatment by more than a pre-specified

small amount.

This amount is non, is the non-inferiority margin.

8:07

On this slide we're going to look at

how non-inferiority margins are used with confidence intervals.

And some of you are probably familiar with the concepts

of point estimates and confidence intervals from your bio statistics class.

But since not everyone has had bio statistics I'm

just going to take a moment to review these concepts.

8:26

A point estimate is a single

value that estimates some population parameter based on our sample data.

An example is the sample mean, which is the average of the values in our sample.

And it's frequently used to estimate the unknown population mean.

An interval estimate specifies a range within which the

population parameter is estimated to lie based on the sample.

How likely the interval is to contain

the parameter is determined by the confidence level.

And that's usually expressed as a percentage.

The most commonly used confidence interval is the 95% confidence interval.

For a 95% confidence interval, one can expect

that if you sample repeatedly from the same

population, 95% of the confidence intervals of the

sample mean will contain the population mean of interest.

9:14

In this figure, we have several confidence intervals,

and these are indicated by the blue and

red horizontal lines.

The point estimates are designated with the short vertical lines

that you see in the middle of the confidence intervals.

9:27

These point estimates represent the sample estimate of the

treatment difference between the experimental and the control groups.

The solid black vertical line that runs from the top to

the bottom, and ends above the zero, is the zero line.

And point estimates that are close to the zero line indicate

that our best estimate is that there is not

much difference between the treatment effects in the two groups.

9:50

Point estimates that fall to the left of

the solid zero line, favor the experimental treatment,

and point estimates that fall to the right

of the zero line favor the controlled treatment.

10:10

You'll notice that there's another long vertical line that

is dashed and has a delta at the top.

This is our non-inferiority margin.

If the confidence interval crosses or falls

to the right of the non-inferiority margin,

then we cannot reject the null that the experimental is worse than the control.

10:42

So in this figure, the confidence intervals that are shaded

in blue fall entirely to the left of the delta line.

So in those cases we can say that we have shown non-inferiority.

The confidence intervals shaded in red, cross the

delta line so we cannot say that we've

shown non-inferiority.

Only the bottom confidence interval falls to the left of the

non-inferiority line and also entirely to the left of the zero line.

And so, in that scenario, we can say that

we've shown non-inferiority, and we have also shown superiority.

11:17

Trials are sometimes designed with

nested non-inferiority and superiority hypotheses.

Investigators might

design the trial so that if non-inferiority is established when the

study is finished, then they can go on and test for superiority.

11:32

The more common situation is when investigators fail to show

superiority, but they might then test if they can show non-inferiority.

So they can't say that the experimental treatment is better than the control,

but they can say that it's at least not inferior by some small amount.

In the example of the non-inferiority trial

from the previous slide, the advance two trial,

the investigators had planned a priori to

test for superiority once they had established non-inferiority.

So this brings us to the end of the section on designs for hypothesis testing.

We've covered superiority, equivalence and non-inferiority hypotheses and

in the final section we'll cover adaptive designs.