Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

34 ratings

Johns Hopkins University

34 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so let's just discuss maximum likelihood a little bit more.

Â The value of theta where the curve reaches its maximum is the so-called

Â maximum likelihood element estimator.

Â So, it is the value of the parameter that is most well supported by the data

Â given the likelihood.

Â So, it's called the maximum likelihood estimator or MLE.

Â So we could just define the MLE as the argument maximum of the likelihood

Â over theta.

Â And it has this nice interpretation that the MLE is the value of the parameter

Â that makes the data that we observed most probable, right.

Â So the likelihood is kind of thinking of the joint probability

Â of the data as a function of the parameter.

Â So it's sort of like tuning that parameter to where it makes the probability of

Â the data that we observed most probable, which seems to make sense because

Â we did observe the data that we observed, so it must be somewhat probable.

Â So here's some results.

Â If we have some normal data, iid normal data, the MLE of mu is X bar, and

Â the MLE of sigma squared is the bias sample variance.

Â We divide by N instead of N minus 1.

Â If X1 to Xn are Bernoulli, then the MLE of p is X bar, the sample proportion of 1s.

Â If Xi are binomial nipi, then the MLE of

Â p is the total proportion of 1s, okay?

Â If the Xs are Poisson lambda t, if an X is Poisson lamba t,

Â then the MLE of lambda is X/t, the rate.

Â If you have a bunch of iid Poisson random variables then the MLE of

Â lambda is the total number of events divided by the total rate, okay?

Â So let's go through this example right here,

Â where you saw 5 failure events for

Â 94 days of monitoring a nuclear pump.

Â Assuming a Poisson model, plot the likelihood.

Â And by the way, right, we know what the MLE is right.

Â The MLE is 5 over 94.

Â Okay, let's see, so my lambda is

Â the parameter I'm interested in.

Â So I'm just going to create a grid of lambda values to create my function.

Â My likelihood is just the Poisson density,

Â now viewed as a function of all these lambda parameters.

Â But remembering that in each case I sampled for

Â 94 days with the data fixed at 5.

Â And then the MLE for lambda,

Â lambda hat is equal to 5/94,

Â so 94 time lambda hat is just 5.

Â So if I plug in 5 for lambda, I will get the MLE, okay?

Â I will get the likelihood at the MLE.

Â So if I plot my lambda by my likelihood, and then I don't like the frame around it.

Â Line width is 3, type = "l" connects it in a line.

Â So if you type expression,

Â it'll actually put the symbol lambda rather than the word lambda.

Â So you can be fancy like that if you'd like.

Â And then, here's this red line, is just showing where the MLE is And

Â then of course the likelihood achieves 1 at that point.

Â And then I draw these two kind of reference lines,

Â I'll talk about them in a second.

Â Okay, so this is the likelihood.

Â So this is a plot of the estimated rate by the evidential support for

Â that estimated rate.

Â So if we wanted to compare this point and this point,

Â it would be that ratio of those two guys, okay?

Â Now, I like to draw these reference lines for the following reasons.

Â So this reference line is at one-eighth, and

Â this reference line is at one-sixteenth, okay?

Â And the reason I like to draw these reference lines is as follows.

Â This point, so I know that this line is one-eighth right here.

Â So, take this value of lambda right there, and

Â compare it to the MLE value of lambda, right.

Â This ratio, because the top is 1, and the bottom is one-eighth,

Â that ratio is 8 if you put the MLE in the numerator, and

Â one-eighth if you put this value in the numerator.

Â The same thing would go for this value, right here at that corner.

Â So, basically, the MLE is eight times better supported than that point.

Â But then if I take any other point in between the two, right,

Â its height, because its height is a little bit less than the MLE,

Â we know that it's going to be slightly less well-supported, and so

Â that it'll be slightly less than eight times better, supported.

Â Okay, but, what's interesting right,

Â if you were to take, now remember

Â this is one-eighth right here.

Â Take this point right here, right?

Â Okay, that value of lambda, right,

Â its value is less than one-eighth.

Â So when we were compare it to the MLE,

Â it would be worse than eight times better support.

Â The MLE would be better than eight times supported than that point.

Â Okay, so any other point in this range, right, you cannot find

Â a point that's more than eight times better supported, right?

Â Every point outside of that range you can

Â find a point that is more than eight times better supporte.

Â So these collection of points right here, these collection of lambda values

Â right here are exactly the points such that there is no other point that

Â is more than eight times better supported if we draw that line at one-eighth.

Â Of course this is all predicated that at having normalized the likelihood

Â with the MLE at 1.

Â And this line right here, all of the points of lambda that fall within this

Â interval, right, because it's one-sixteenth, are the points

Â such that there is no point that is more than 16 times better supported.

Â And that's kind of useful, right, because, and

Â you can do these experiments, so where do I get 8 and 16?

Â So 8 is 2 cubed and 16 is 2 to the 4th.

Â I get those because if you do these coin flipping experiments where they'll

Â comparing the likelihood of whether a coin is two-headed versus the likelihood

Â of the coin being fair, you find that kind of intuitively, people start switching.

Â If they get three consecutive heads, they start saying, oh,

Â that coin appears to be unfair, to the tune of being two-headed.

Â And in about four consecutive coin flips,

Â they're quite certain that the coin is unfair.

Â So, I consider these kind of like moderate and

Â sort of strong evidence in favor of the points within these lines.

Â So just to reiterate, if you take this interval its kind of like

Â these are basically the likelihood equivalent of a confidence interval.

Â And so if you take all of these points, there is no such point

Â that is more than eight times better supported given the data in the model.

Â If you take a point outside of it, say this point, we know at least one point,

Â namely the MLE, that's more than eight times better supported.

Â So those are the so-called likelihood intervals.

Â And you might say well you might say, well wait,

Â the one-eighth is kind of arbitrary, but it's no more or

Â less arbitrary than say constructing a 95% confidence interval.

Â What you really want to give people is the full plot,

Â because it conveys all of the relevant information.

Â The only problem with likelihood is if you have

Â multiple parameters like in a regression setting, then you have to figure out

Â how to do something that displays just the likelihood.

Â But I wanted everyone to at least hear about the likelihood.

Â The remainder of the class, we'll be focusing more on frequency style inference

Â that does not show likelihood based plots.

Â But I wanted people to be aware of this style of inference.

Â Coursera provides universal access to the worldâ€™s best education, partnering with top universities and organizations to offer courses online.