0:03

So, this problem is particularly useful for motivating so-called Bayesian

analysis. And, we've spent a lot of time in this

class talking about frequentist analysis in the form of confidence intervals.

And we've spent a fair amount of time talking about the likelihood,

Probably, more time devoted to the likelihood than most introductory

statistics courses deal. So, we need to give at least some time to

talk about Bayesian statistics. So, here's how Bayesian statistics works.

So, Bayesians have to pose it a prior on the parameter of interest.

The prior is a density or mass function, It's a probability distribution on the

parameter where the probabilities, at least in the classical Bayesian sense,

represent our beliefs about that parameter.

And then, the likelihood is the component of the Bayesian equation that depends on

the data, the objective part. And then, the posterior we're going to obtain as the

likelihood times the prior. So, this is exactly like, if you remember

back when we were thinking about diagnostic tests.

We had, say, for example, some prior belief that a person had the condition

that the test was trying to diagnose. We have the data, which is the result of

the test. And the posterior odds of the person

having the disease or whatever the test is testing, wound up being related to the

likelihood ratio times the prior odds. And so, there's, it's the same exact sort of

relationship here, posterior equals likelihood times prior.

Now, I have to put a proportional to sign here because it's not exactly equal to,

we're off by a constant or proportionality. But, it's easiest to

think of as this way, we take out likelihood, we multiply it by our prior

and we get our posterior. The rub in Bayesian statistics Bayesian

statistics is very neat and conceptually clean way to think about statistics.

The rub is really in here, specifying the prior.

That's where we get into trouble in Bayesian statistics.

And we'll talk maybe a little bit about that.

But mostly, in this class, we're just going to talk about the mechanics of how

you go about performing a Bayesian inference.

And then, you can take later classes to delve into the specifics of all the

different ways in which Bayesians can think about doing analysis.

So, let's talk about how we can specify a prior for our binomial proportion.

So remember, our binomial data is discrete.

It can take only values between zero and n, but the proportion that we're trying to

estimate is a number that, let's say, we're going to treat as if it's

continuous. So, if we're going to specify a

probability distribution on that parameter, it's going to have to be a

continuous distribution. So, we need a continuous distribution

that's bounded from below by zero, And bounded from above by one.

And ideally, it would be a nice distribution that's easy to work with.

Well, there is one such distribution, it's called the Beta Distribution.

So , the beta distribution winds up being kind of a default prior for binomial

proportions. And the beta density depends on two

parameters, alpha and beta. Don't confuse the alpha here from the

alpha earlier on in the lecture that was related to the coverage rate of the

confidence interval. So, it depends on two parameters, alpha

and beta. And the beta density looks like this.

It's this so called gamma function. Gamma of alpha plus beta divided by gamma

of alpha times gamma of beta. And then, p raised to the alpha -one times

one minus p raised to the beta -one. And here, p is allowed to range between

zero and one. This constant term out front,

Gamma of alpha plus beta divided by gamma of alpha times gamma of beta,

That's simply the constant of proportionality that you have to obtain to

get this integral, Integral p to the alpha minus one, one

minus p to the beta minus one, to integrate to one.

So, you had some problems very early on in the class where if you had a kernel of a

density, in this case, p to the alpha minus one, one minus p to the beta minus

one, that had a finite integral, what you had to do was divide that function by its

integral over the whole range of values, and you get a proper density.

And that's exactly what people did to get to beta density.

So, here is this density. It does integrate to one.

And maybe it's a little bit beyond the scope of this class to verify that it

integrates to one. So, let's talk about some of the

properties of the beta density. So, the mean of the beta density is alpha

over alpha plus beta. And remember, alpha and beta are positive.

So, alpha over alpha plus beta has to be a number between zero and one.

So, we're good that the mean of the density lies in the range of values for

which the density is greater than zero. The variance of the density works out to

be alpha times beta divided by alpha plus beta squared, alpha plus beta plus one.

And we've seen special cases of the beta density before.

Take the special case when alpha equals beta equals one.

Well then, p to the alpha minus one, one minus p to the beta minus one, that all

just goes away and this density is just a constant between zero and one.

And we may not know what the gamma function of alpha plus beta over gamma of

alpha times gamma of beta is, But you don't need to because you know

that the density is a constant density between zero and one.

It has to be exactly the uniform density then.

So, the uniform density is exactly a special case of the beta density.

Here, on the next slide, I plug in a bunch of different values of alpha and beta and

I show you the shape of the uniform density.

So, if I plug in alpha equal to beta equal to 0.5, then I get something that looks

like a U shape. And, it heads off to infinity, height of

infinity at both the zero and the, as it heads towards zero and one.

If alpha equals 0.5 and beta equals one, it looks like this shape right here.

And then, as beta gets larger and larger, the rate at which it drops down to zero as

p approaches one gets faster. And then, of course, it just reverses

itself. If beta is 0.5 and alpha is one, or alpha

is two. Again, here's the uniform distribution

when alpha and beta are both one. If you plug in an alpha of one and a beta

of two, you just get a line pointing downward.

If you plug in an alpha two, beta of one, you get a line pointing upward.

Probably, the most kind of typical-looking cases of the beta is when the alpha and

beta are both greater than one, and then you get a hump-shaped density.

If they're equal, it's centered right at 0.5, and as alpha and beta get bigger and

bigger it gets more peaked around 0.5. But, by allowing alpha to be bigger than

beta or beta to be bigger than alpha, you can get this to be a distribution that's

skewed towards zero or skewed towards one. So, you can get quite a few shapes from

the beta density by playing around with alpha and beta.

So, if you're Bayesian, what you need to do is you need to pick values of alpha and

beta that represent where the shape of the density represents your beliefs about the

perimeter p. And then once you do that, then you can

start doing Bayesian analysis. So, here on the next side, we need to

choose values of alpha and beta so that the beta prior's indicative of our degree

of belief regarding p in the absence of data.

And then, we're going to use the rule that the posterior is the likelihood times the

prior. And again, because we're talking about

constance of proportionality, we'll throw out anything that doesn't depend on p.

So, in this case, the posterior is proportional to the likelihood, which is p

to the x, one minus p to n minus x. And here, when I say proportional to, I

mean proportional to in the parameter, p. So, p to the x, one minus p to n minus x,

that's the likelihood. And we throw out the binomial constant and

choose x cuz that doesn't depend on p. And then, we have p to the alpha minus

one, one minus p to the beta minus one. And we throw out the ratio of gamma

functions because that doesn't depend on p.

Now, we multiply those together and we get p to the x plus alpha minus one, one minus

p to the n minus x plus beta minus one, and the posterior is a density, that has

this form. Now, we know it's proportional to that.

But remember, it's proportional to that but look,

This density that we see here is exactly just a beta density,

Right? It's p raised to some power minus one, one

minus p raised to another power minus one. In fact, the alpha is just now x plus the

prior alpha, and the beta is just the number of failures plus the prior beta.

So, we could even tell you what the ratio of gamma functions you would have to have

to make this posterior proper density, To normalize the posterior.

But, we don't need to do any calculations or integrals to do that.

We can do that just by looking at it and saying, oh well, if I take a binomial

likelihood and multiply it times a beta prior and think of that as a posterior

density, Then that posterior density has exactly

the form of the kind of core part of a beta density so that I know it's a beta

density. So, if the posterior is a beta density

with parameter alpha tilde equal to x plus alpha and beta tilde equal to n minus x

plus beta, we know lots of its properties. As an example, we know what the posterior

mean is. So, what to I mean by posterior mean?

So, the posterior is the distribution of the parameter given the data,

Right? So, the likelihood is the probability of

the data given the parameter. The prior is the probability of the

parameter disregarding the data. So, the posterior winds up being the

probability of the parameter given the data.

So, we can calculate, as an example, the expected value of the parameter p given

the data. And, because p is in the posterior, a beta

density, this works out to just be the expected value of a beta distribution

which we've learned earlier as being the alpha parameter divided by alpha plus

beta. So, in this case, it's alpha tilde divided

by alpha tilde plus beta tilde. Well, let's just plug in alpha tilde equal

to x plus alpha, And beta tilde equal to n minus x plus

beta. And here, I do some manipulations and show

that you can get down to the point where it works out to be x over n times n

divided by n plus alpha plus beta plus alpha over alpha plus beta times alpha

plus beta divided by n plus alpha plus beta,

Which is a mouthful, but let me go through each term.

X over n is the sample proportion, It's the MLE, it's p hat.

So, x over n, is p hat. Let's take this second term, n over n plus

alpha plus beta. That's a number that has to be between

zero and one because n is positive and alpha and beta are positive.

So, we have n divided by something that's bigger than n.

And then notice, okay, so we have this number that's between zero and one, let's

call it pi. Okay?

And then, alpha over alpha plus beta is the prior mean.

Okay? And then, this term right here, alpha plus

beta over n plus alpha plus beta, You can check yourself, that's one minus

pi where we defined pi just a second ago. So, this equation works out to be an

average of the MLE and the prior mean, Okay? Now, it's not an average in the

sense that it's 0.5 on both things, right? It's a simplicious average. So, pi can be

between zero and one. And then, hence, one minus pi is the

opposite. But that's exactly an average.

It's an average of the MLE. So, let me try to state that in English.

The posterior mean is the average of the MLE and the prior mean.

Now, the average is a very specific kind of average where it weights the MLE

different than the prior mean. So, let's look at these weights.

So here, let's suppose n is really big. Then, what happens to n over n plus alpha

plus beta? Well, this term pi gets very big.

It gets much closer to one, and hence, one minus pi gets much closer to zero.

So then, when n is very big, this mixture weights the MLE a lot more than it weights

the prior mean. In other words, as you collect more data,

your prior means less and the data means more.

What happens on the other hand as alpha and beta get very big and n remains

constant? Was alpha and beta get really big,

Then we get alpha plus beta over n plus alpha plus beta.

This one minus pi part goes to one, that gets very big.

So, one minus pi gets very big, and pi gets very small.

So, what happens as alpha and beta get big?

What does that means in terms of our prior?

Well, if you remember back from a couple of slides ago, the shape of the beta

density as alpha and beta got bigger and bigger, the shape of the beta density got

more concentrated around the mean. And what that entails is that it's saying

that our prior belief was a lot more confidence and it's specific value of p.

And so, what that implies is if we are incredibly certain in our prior, then that

swamps the data, Right?

If we're incredibly certain in our prior that swamps the data.

Our MLW has very little weight and our prior has a lot of weight.

And this actually explains a lot of politics for you, for example,

Right? So here, your opinion is a mixture of the

data and your prior beliefs. If you're immovable off your prior

beliefs, then it doesn't matter how much data you collect,

Tight? On the other hand, of course, if your, in

this case, if your alpha and beta are quite low, then you wind up that the MLE

dominates the posterior mean. So, let me just rehash this because it's

an important point. The posterior mean is a mixture of the MLE

p hat and the prior mean as pi goes to one it's end gets large.

And for large n, the data swamps the prior and the MLE dominates.

For small n, then the prior mean dominates.

So, when you have very little information, you rely on your prior knowledge.

The idea behind Bayesian statistics is that it should sort of generalize how

science is ideally working. As data becomes increasingly available,

prior beliefs should matter less and less. And then, again, prior that is degenerate

at a value, so as alpha and beta go to infinity, you wind up with a prior that is

100% on a specific value of p. Then, no amount of data can overcome that prior.

So, let's also look at the posterior variance.

The posterior variance takes a nifty form as well.

So, let's look at the variance of p, The posterior variance, given the data.

So, p in absence of the data was beta with parameters alpha and beta.

P given the data, via the Bayesian calculation was also from a beta

distribution with parameters alpha tilde and beta tilde.

So, we can just calculate the variances directly the variance from a beta with

alpha tilde and beta tilde plugged in for alpha and beta.

And here, you see I plug in for alpha and beta x plus alpha for alpha tilde,

And n minus x plus beta for beta tilde. And you get this form.

So, let me let p tilde equal x plus alpha over n plus alpha plus beta, and n tilde

equal n plus alpha plus beta. Then, you wind up with the variance of p

given x works out to be p tilde, one minus p tilde divided by n tilde plus one.

Which is interesting, because it's not quite but very similar to the binomial

variance, the binomial variance being p times one minus p over n.

And so, the sample binomial variance would be p ha one minus p hat over n. So, it's

an awful lot like that. So, it takes this very, very convenient

form. And, in fact, let's go back to an earlier

point. If alpha and beta were both two, Then the posterior mean works out to be p

tilde, x plus two divided by n plus four. And the posterior variance works out to be

p tilde one minus p tilde divided by n tilde plus one.

So, this is exactly the sample proportion that we used in Agresti-Coull interval and

the posterior is almost the same, with the exception of this plus one.

So, what's a plus one among friends? So, we'll just say, it's roughly the same

variance as the Agresti-Coull interval. So, this one way to motivate the

Agresti-Coull interval, It is centered at the posterior mean and

it's standard error is not exactly, but almost the posterior variance.

So, you could view it as a normal approximation to a posterior interval. And

so, that's one way to motivate the Agresti-Coull interval is just to say

alpha and beta equals two from a Bayesian analysis, and you get something that's

very, very similar. So, let's go back to our previous example

and just do some of the Bayesian calculations.

Let's say x13 = thirteen and n20. = twenty.

So now, let's consider a uniform prior. Alpha equal beta equal one.

In that case, the prior is just one, a constant between zero and one.

What's interesting ab, in this case, about the uniform prior is that the posterior is

equal to the likelihood, Right? Because you have posterior equals

likely at times prior, in this case, the prior is just a constant one.

So, the posterior equals the likelihood. Now, you can't always get away with doing

this. This is particular to the fact that the

parameter that we're interested in is bounded between zero and one.

For example, if your parameter was anything between minus infinity and plus

infinity, you can't put a prior of one on that and have a finite integral.

Now, people have looked into that actually, and they said, well, maybe you

can do it. And, that's for part of the classes.

For this class, it's kind of nice to note that in this case, if we set alpha equal

to beta equal to one, we get a proper density exactly a uniform density, and our

posterior is exactly equal to the likelihood,

Which is interesting. If instead, we were to set alpha equal to

beta equal to two, remember this prior just looks like a hump right at 0.5, then

the posterior works out to be p to the x plus one, one minus p to the n minus x

plus one. And so, the very classical way to do

Bayesian analysis is you say that the prior is sort of governed by expert

knowledge, and the likelihood then is, of course, the objective part that's governed

by the data. And, of course, to say that it's the

objective part is a little bit misleading because someone had to subjectively elect

the model data as if it's binomial. So, there is of course, a subjective part

to the likelihood itself. But, you know, let's put that aside.

We have the supposedly objective part in the likelihood,

We have the subjective part in the prior and then the posterior is the mixture of

how you update your subjective beliefs with your objective prior knowledge.

That's the kind of classical Bayesian inference. But people said, well,

It's in many cases, many, many cases, people don't want statistics that depend

on expert opinions to start with. So, this idea of a subjective prior is

really is just not palatable to the idea of science.

So then, Bayesian's went back and thought hard about it and they said well, maybe we

can come up with priors that are sort of, go-to proiors for us.

Things that we can just use where we don't have to think about how to specify the

prior, it's so-called objective priors. And because of that, the collection of

Bayesian techniques then sort of ballooned onto a variety of different ways of

thinking about how to be a Bayesian. The only thing in common they have is that

they utilize the Bayesian machinery that the posterior is equal to the likelihood

times the prior. But then, they have lots of different ways

of thinking about it. And one way of thinking about it is the

so-called Jeffreys prior, where people said, well, maybe we can pick a prior that

has these specific mathematical properties.

And for this particular problem, the Jeffrey's prior sets alpha equal to beta

equal to 0.5. The uniform prior is another nice one

that's somewhat objective cuz we could say, well, why don't we put a constant

prior? That way the likelihood is the posterior

that seems pretty objective to me. There are problems with doing that. The

point is, is that, Uniformity on one scale is not uniformity

on another scale. So, the fact that the prior's uniform for

p means that it's not uniform for p2, squared, for example.

That you would calculate the distribution of p2, squared, it's no longer uniform.

So, a uniform distribution doesn't adequately represent absence of belief.

The, The problem with that is there's no

probability density that measures absence of belief about a parameter.

If, if you've written down a density, you've specified belief.

You've completely characterized it's probabilistic behavior. So, so anyway,

these are very technical problems with Bayesian analysis and they all kind of

revolve around, how in the world to we set this prior.

But, in this case, I think people would say the uniform prior seems pretty

reasonable, the Jeffrey's prior seems pretty reasonable.

And putting a prior that's humped at 0.5 because, you know, shrinking everything

towards 0.5 also seems pretty reasonable. All those things don't seem so bad.

And the benefit is, no matter what you choose, someone else could pick a

different prior as long as you gave them the likelihood, someone else could pick a

different prior than you. So, the, the idea that you could just pass

around the likelihood, and everyone could pick their own prior is also quite

palatable way to do Bayesian inference. So, I'm going to go through some pictures

just to show you and I fudged a little bit.

I'll tell you how I fudged a little bit on the pictures.

So here, I normalized everything so that it's one.

But then, here, in this first one the problem is that the prior heads off to

infinity near zero and near one. So, if I were to normalize it, I would

just get I can't divide by infinity so I, I fudged a little bit.

So, this U-shaped curve looks different than the U-shaped curve that I'm plotting

here. So, in order to get it on the same plot, I

fudged a little bit. So, if you try and do this, you'll see how

I fudged. But, okay. So, the U-shaped curve isn't to

the right scale but I put it on the same scale as the posterior in the likelihood

which both of those I normalize so its peak was at one.

So, the blue is the prior. In this case, the Jeffrey's prior to alpha

equal to beta equal to 0.5. The green is the likelihood and the red is

the posterior. So, you see what happens when you multiply

the green times the blue and then re-normalize,

You get a red curve that looks an awful lot like the likelihood.

So, in this case, the Jeffrey's prior doesn't move us off our likelihood very

much. And the posterior inference, which is

entirely based on this red curve, is pretty much exactly identical to the

likelihood. Then, of course, on the next slide, if the

prior is completely flat, the posterior and the likelihood are identical.

So, there is no green curve in this case, it's exactly underneath the red curve.

Now, let's look at alpha equal beta equal two.

Then, my prior is this hump shape at 0.5. You can see that my likelihood is the

green shape, and my posterior is the red shape.

And you can see it's ever so much shifted towards 0.5.

So, the red shape is the mathematical compromise between the knowledge codified

by my blue prior, And the objective part, codified by the

likelihood. Again, I should put objective in quotes.

Now, let's make it more extreme to kind of show you what's happening.

Let's put alpha2 = two and beta10 = ten. And then, the blue curve gets shifted a

lot towards zero. As beta gets much bigger than alpha, the prior becomes more pushed

up towards zero. As alpha becomes much bigger than beta, it

becomes pushed up towards one. And then, as, if alpha and beta are equal

and they get larger and larger, gets more peaked around 0.5.

So anyway, now we're all pushed up towards zero.

And you can see, here we have the blue curve is the prior, pushed up toward zero

because beta is much larger than alpha. And it has a finite maximum because both

of them are bigger than one. And then, we have the green likelihood

which has been constant through every, one of these pictures. And then, we have the

red posterior which is the compromise between the evidence represented by our

data and the assumed likelihood, and our blue prior which represents our knowledge,

our prior knowledge. And so, the red curve is the appropriate

mathematical compromise between these two opposing positions.

And, in this case, let's say, you had a prior belief that prevalence of

hypertension was very low, you thought it was on the order of 0.1.

Your data says, no, no, no. It's very high.

It's on the order of 0.65, right? And so, your likelihood is that compromise

to say, well, your data has moved me very far away from my prior towards the MLE of

0.65 and that's how the mathematics works out.

And if, as n goes to infinity, this green curve, the likelihood, will get more and

more peaked around whatever the true value is, and it'll just grab this red curve and

pull it increasingly towards it. So, what happens in politics, for example?

Well, people have their blue curve is very spiked, right?

They're dead set in their opinions and no amount of data is going to move them off

of it. So here, is an example where I have alpha

= 100 and beta = 100. What happens then?

Alpha and beta are equal so the beta distribution centered at exactly 0.5.

But, as alpha and beta goes to infinity, the variance of the beta distribution gets

really small so our prior, we're quite sure according to our prior, that beta is

exactly 0.5. So then, we collect our data, and it says,

ehh, I don't think so. Beta is not 0.5.

It's somewhere above 0.6 more likely, Right?

And then, what happens to our posterior? Our posterior says, well, I don't know.

You were very sure. I'm not,

I'm going to kind of ignore the data because of how sure you were.

So, this is, of course, the problem with extremely informative priors, right?

No amount of data is going to knock you off them.

Here, the red curve almost overlaps with the blue curve.

So, the red curve in the previous examples is the posterior.

The posterior is the distribution of the parameter, given the data.

In Bayesian statistics, that's everything. If you give someone the posterior, that's

it. You've given them everything, that, that's

the summary of evidence as far as the Bayesian is concerned.

But it's a curve, it's hard to work with. You can only look at it in graphs.

And then, if you have multiple dimensions, it gets even worse.

So, you know, we want to summarize it. Well, one way to summarize that curve is

by it's mean, Right? The associated mean, the posterior

mean. Another way to summarize it is by it's

variance, the posterior variance. But we might want something analogous to a

confidence interval, but a confidence interval is a frequentous property.

It talks about supposed fictitious repetitions of experiments, that's not

within the Bayesian ideology really. So, we need something that's analogous to

a confidence interval. For all likelihood, we had something that

was analogous to a confidence interval and we called it a likelihood interval.

So, the Bayesians created something and they called it credible interval.

The Bayesian credible interval is just an analog of a confidence interval.

So, in 95% creditable interval, a to b, Just satisfies that probability that the

parameter lies in that interval given the data is 95%.

Really simple. You know, if you believe in the Bayesian inference, higher values of

the posterior represent kind of better supported values of the parameter.

So, just like the likelihood, you're better off chopping off the posterior with

the horizontal line and figuring out exactly what values of a and b that

entails to force it to be at 95%,. And, that's called the highest posterior

density interval. And I have a picture here, where I kind of

do that. So, if you could imagine this horizontal

line, the red area would vary as we moved it up and down.

As we moved it down, the red area would get bigger and bigger. As we moved it up

and up, the red area would get smaller and smaller.

So, you want to keep moving that horizontal line up and down, until the red

area is exactly 95%,, right? And this is density, so that would be the area under

the curve is exactly 0.95. So, once you hit that perfect point where

it's exactly 0.95, and can see where it intersects the curve.

And then, drop down to the horizontal axis,

And those two points are your a and b. So, the probability of p lies between that

a and b is, of course, just the integral between those points, which is exactly the

red area. So, you wind up with a credible integral.

In this case, it works out to be 0.44 t 0.84 which should be no surprise.

And, and in r, you can do that with the binom package. In this case, binom.bayes

thirteen, twenty, thirteen successes, twenty trials.

And you have to do type equals highest, And that gives you the 95% credible

interval. And, it uses a Jeffrey's interval.

As I said earlier, Bayesian credible intervals,

Even though they are constructed using Bayesian thinking,

If you turn around and evaluate them with frequentist performance, they tend to

perform very well. Just like our Agresti-Coull interval which

wasn't exactly a Bayesian confidence interval but was close enough among

friends. That actually has much better performance

than the directly CLT constructed Wald interval.

The other thing I want to mention before I go through the final bit of this lecture

is that another way to create a confidence interval would be to pick a to be the

lower 2.5th percentile of the posterior distribution.

And pick b to be the 97.5th percentile of the posterior distribution.

And that would give you exactly a 95% interval, for example.

But, the posterior height of the lower point and the posterior height of the

upper point would be different. So that is potentially a problem.

On the other hand, if you do the HPD interval, you've got to vary this line.

You have to solve a root equation to obtain them.

So, it's a little bit annoying. And finding the percentile interval, the

so-called percentile interval, the lower 2.5 percentile and the upper 97.5th

percentile as an example to get a 95% creditable interval is very easy.

So, another way to construct a Bayesian credible interval is just to take the

lower and upper percentile and run with it that way.

I think you're better off doing the HPD interval if you can.

So, I want to end with one nice aspect of the Bayesian credible interval, if you're

hardcore about these things. So, let me just say for a minute about

what I mean by being hardcore. So, probably many of you have taken an

introductory statistics class. And probably many of you have seen the

baffling interpretation associated with frequentist confidence intervals presented

as a test question. And that is just, you know, kind of

hard-ball frequentist. And it's accurate, you know, I don't want

to criticize it, it's accurate. And sob here's an example.

We have a Wald interval, it works out to be 0.44 to 0.86. And let's assume that the

95% coverage of the Wald interval is good enough.

The CLT is kicked in, in this case, and we're fine.

And we're not worried about the mathematical performance of the confidence

interval. We're, we're interested in the, just the

strict interpretation assuming that the coverage is correct.

Then, the fuzzy interpretation is that we're 95% confident that p lies between

0.44 to 0.86. But, that's not the actual interpretation.

The actual interpretation is the interval 0.44 to 0.86 was constructed such that in

repeated independent experiments, 95% of the intervals obtained would contain p.

That's the actual confidence interval interpretation.

It's this idea, performance frequentist refers to frequency ie.,

The definition of probability of it being entirely entwined with fictitious

repetitions of experiments. Or, you know, lifetime batting averages for success

probabilities and that sort of thing. That's what frequency interpretation is

the, the actual interpretation is almost exactly no one interprets frequentist

confidence interval this way because it's such a mouthful.

Everyone is kind of thinks, well, My interval 0.44 to 0.86 is a interval

that accounts for uncertainty at a, kind of, control rate of about 95%, where that

control rate has a contextual meaning with respect to frequentist statistics. And,

and I understand that, but I don't spit it out every time I interpret the confidence

interval. Every now and then, a confidence interval

makes its way into the news, and news people never interpret it right because

it's hard to interpret. So, a likelihood interval, let's go on to

the next one. The likelihood interval was 0.42 to 0.84,

the 1/8th likelihood interval. And, in the fuzzy interpretation for the

likelihood interval was that the interval 0.42 to 0.84 represents plausible values

of p. Here, plausible defined by the eight fold likelihood ratio associated with the

end points, relative to the MLE. So, yeah, that's okay.

And so, the fuzzy interpretation is okay, it's no worse than the frequentist fuzzy

interpretation. But the actual interpretation, let's go

through at the interval 0.42 to 0.84 represents plausible values for p.

In the sense, that for each point in the interval there is no other point that is

more than eight times better supported given the data.

Again, yikes. You know, this is a mouthful and, you

know, anyone who constructs a likelihood interval is not going to interpret that

way. They're going to say,

You know, it's an interval, it accounts for uncertainty, it's based on the

likelihood, the calibration is based on sort of eightfold likelihood ratios, and I

understand what it means, but I don't spit it out every time I use the interval.

The nice thing about the Bayesian interval is that you can spit out the actual

interpretation every single time you use it because the interpretation's very easy.

So, the Jeffrey's 95% credible interval was point 44 to point 84.

The actual interpretation is the probability that p lies between 0.44 and

0.84 is 95% full stop. So,

That's super easy. Now, there's a lot loaded in this word

probability here because it's the Bayesian version of the word probability that maybe

not everyone would like to agree with, And not everyone would like something

that's more objective, or something like that.

But nonetheless, if you're willing to buy into the Bayesian way of thinking, the

simple interpretation of the credible intervals is quite nice. And this

interpretation is, you know, if you see a confidence interval in the news or if you

present a confidence interval to people who have just a little bit of statistics,

this is how they want to interpret confidence intervals.

And you can't say this statement for a frequentist interval,