Today, we'll describe priors that are constructed as a mixture of conjugate priors. In particular, the Cauchy distribution, as these are no longer conjugate priors nice analytic expressions for the posture distribution are not available. However, we can use a Monte Carlo algorithm called Markov chain Monte Carlo for posterior inference. In many situations, we may have reasonable prior information about the mean from mu, but are less confident in how many observations our prior beliefs are equivalent to. We can address this uncertainty in the prior sample size, through an additional prior distribution on a n_0 via a hierarchical prior. We will use a gamma distribution with mean_one over r_squared. Now, if r is one, then this corresponds to a prior expected sample size of one. The marginal prior distribution from mu can be attained via integration. This is a Cauchy distribution centered at the prior mean, with the scale parameter sigma times r. The Cauchy distribution does not have a mean or standard deviation, but the center or location and the scale play a similar role to the mean and standard deviation of the normal distribution. The Cauchy distribution is a special case of a student T distribution with one degree of freedom. As the plot shows, the standard Cauchy distribution with r equals 1 and standard normal distribution are centered at the same location, but the Cauchy distribution is said to have heavier tails, that is more probability on extreme values than the normal distribution with the same scale parameter sigma. Cauchy priors were recommended by Sir Harold Jeffreys as a default objective prior for both estimation and testing. The Cauchy prior is not a contrary prior, and therefore, the posterior distribution from mu, given sigma, is not Cauchy or any well-known distribution. However, the conditional distribution mu and sigma, given n_0 in the data is normal gamma and easy to simulate from, as we saw in the Monte Carlo video. The conditional distribution of n_0, given mu, sigma and the data is a gamma distribution, also easy to simulate from given mu and sigma. It turns out that if we alternate generating Monte Carlo samples from these conditional distributions, the sequence of samples converges to samples from the joint distribution of mu, sigma, and n_0, as the number of simulated values increases. The Monte Carlo algorithm, we've just described is a special case of Markov chain Monte Carlo known as the Gibbs sampler. Let's look at the pseudo code for the algorithm. We'll start with the initial values of each of the parameters for i equals 1. In theory, these can be completely arbitrary as long as they are allowed values for the parameters. For each iteration, i of the algorithm, will cycle through generating each parameter, given the current value of the other parameters. The function's p_mu, p_sigma2, and p_n_0 return a simulated value from the respective distribution conditional on the inputs. You should note that whenever we update a parameter we use the new value in the subsequent steps as n draws for sigma and n_0. We'll repeat this until we reach iteration S leading to a dependent sequence of s draws from the joint posterior distribution. We will use MCMC to generate samples under the Cauchy prior, the tap water example. For the Cauchy prior we'll use 35 as the location parameter and R equal to one. To complete our prior specification, we'll use the Jeffrey's reference prior on sigma squared. This combination is referred to as the Jeffrey's Zellner-Siow Cauchy prior or "JZS" and the bass factor and stats are package, which will use to simplify analysis. For energetic layers, who want to code the MCMC algorithm themselves, more details are in the online supplements for the course. Using the bayes_inference function from the stats r package, we can obtain summary statistics from the MCMC output, but not only mu, but inference about sigma and the prior sample size. The posterior mean under the JZS model is much closer to the sample mean than what the normal gamma prior used previously. Under the informative normal gamma prior, the sample made a 55.5, its about eight standard deviations above the mean. A surprising value under the normal prior. Under the Cauchy prior, the informative prior location has much less influence. A noted robustness property of the Cauchy prior, when the prior location is not close to the sample mean, leading the posterior to put more weight on the sample mean than the prior mean. We can see that the central 50% interval for n_0 is well below the value 25 used in the normal prior, which placed almost equal weight on the prior in sample mean. Using the MCMC draws of mu and sigma, we can obtain Monte Carlo samples from the predictive distribution of Y, by plugging mu and sigma into the r known function and r, as we did in the Monte Carlo video. The plot shows the posterior densities estimated from the simulative values of mu and the predicted draws of TTHM under the Jeffrey Zellner-Siow prior, and the informative normal prior from mu with n_0 equal to 25 and the reference prior on sigma squared. To recap, we've shown how to create more flexible prior distributions such as the Cauchy distribution using mixtures of conjugate priors. As the posterior distributions are not available in closed form, we showed how MCMC could be used for inference using the hierarchical prior distribution. Starting in the late 80's, MCMC algorithms lead to an exponential rise in the use of Bayes in methods as complex models built through hierarchical distributions suddenly were tractable. The Cauchy prior is well-known for being robust prior mis specifications. For example, having a prior mean that is far from the observed mean. This provides an alternative to the reference prior as a default or objective distribution that is proper. In the next series of videos, we'll return to Bayes factors and hypothesis test where the Cauchy prior plays an important role.