In this lecture, we introduce the concept of conjugacy in Bayesian statistics.

Suppose you believe that your data come from a binomial distribution with known n,

but unknown p. And suppose your prior belief about p,

if it has the beta probability density function with parameters α and β.

If you observe x successes and n trials,

it turns out that Bayes' rule implies that

your new belief about the density of p is also β.

But now, with parameters α + x and β + n - x.

This is an example of conjugacy.

Conjugacy occurs when your new belief,

that is your posterior distribution is in

the same family of probability density functions as your prior belief,

but with new parameter values which have been

updated to reflect what you have learned from the data.

Why are the beta-binomials families conjugate?

To see this, we need to do a bit of math.

Recall the form of Bayes' rule that is used for discrete random variables.

This formula cannot apply to continuous random variables,

such as the p with the beta prior,

since the denominator sums over all possible values of the random variable.

But the p can take any value between zero and one.

Integration is just a generalization of summation.

And so, we need the version of Bayes' rule that applies to continuous random variables.

Notice that it's analogous to the discrete form.

The integral in the denominator is like a sum,

and will be equal to sum constant that ensures that the total area under the curve,

that is the posterior density function, equal to one.

Also, note that the first term in

the numerator is the probability of observing your data,

given a specific value of p,

and the second term, if the density for your prior belief about p.

In the beta-binomial case,

the probability of observing x heads and n tosses,

when the probability of success is p,

is given by the first term in the numerator.

And if you have a beta prior with parameters α and β,

then the density for p is the second term in the numerator.

So if we use the continuous version of Bayes' rule,

we find that the posterior distribution is given by the formula in the last line.

But we recognize this. Look at the numerator.

We can collect the terms of involve p,

finding p raise to the α + x - 1 power,

and 1 - p, is raised to the β + n -x - 1 power.

And everything else are just constants.

And they must take the unique value that is needed to ensure that

the area under the curve between zero and one equals one.

So they have to take the values of the beta which has parameters α + x and β + n - x.

This is a cute trick. We can find the answer without doing

the integral simply by looking at the form of the numerator.

Without conjugacy, one has to do the integral.

Often, the integral is impossible to evaluate.

That obstacle is the primary reason

that most statistical theory in the 20th century was not Bayesian.

The situation didn't change until

modern computing allowed researchers to compute integrals numerically.

What have we learned?

We have learned that there are some families with densities that are conjugate pairs,

which means that if the data come from the first of those families,

and your belief about the unknown parameter has a distribution from the second family,

then after observing your data,

your new belief, your posterior density is also a member of the second family,

but with different parameter values.

We explored this in the context of the beta-binomial conjugate families.

And we saw that conjugacy meant that we could apply

the continuous version of Bayes' rule without having to do any integration.