Let's consider the case of exponential data. For example, suppose you're waiting for a bus that you think comes on average once every ten minutes, but you're not sure exactly how often it comes. We'll say Y falls in exponential distribution, with right lambda. Your waiting time has a prior expectation of one over lambda. It turns out, the gamma distribution is conjugate for an exponential likelihood. Gammas actually are conjugate for a number of different things. We need to specify a prior, so a particular gamma in this case. If we think that the buses come on average every ten minutes, that's a rate of one over ten. So our prior mean should be one-tenth. Thus, we'll want to specify a gamma distribution, this first parameter divided by a second parameter is one-tenth. We can think that about our variability. Perhaps you want to specify a prior, which is a gamma, 100, 1000. This will indeed have prior mean of one-tenth, and it'll have a prior standard deviation, Of 1/100. Thus if we think as a rough approximate of a mean plus or minus two standard deviations as a rough interval for our prior, we would say we're looking at 0.1 plus or minus 0.02 as a possible range for this rate parameter. So, here's a particular prior we could specify. Suppose that we wait for 12 minutes, and a bus arrives. Now, you want to update you posterior for lambda about how often this bus will arrive. Posterior for lambda given y is proportional to the likelihood times the prior. In this case that's lambda e to the minus lambda y, lambda to the alpha- 1, e to the minus beta lambda. Doing a little bit of simplification, we get lambda to the alpha + 1- 1, e to the beta plus y, lambda, negative. I'm writing this as alpha + 1,- 1, so that it is in the form of a gamma distribution. We can now say that lambda given y the posterior follows the gamma distribution of parameters alpha + 1 and beta plus y. Plugging in our particular prior thus gives us a posterior for lambda which is a gamma with parameters 101 and 1012. Thus our posterior mean is going to be 101/1012. This is equal to 0.0998 or approximately over 1/10.02. This one observation doesn't contain a lot of data under this likelihood. Our prior has a fair amount of information in it, and so our posterior doesn't shift a whole lot. When the bus comes and it takes 12 minutes instead of 10, it barely shifts our posterior mean up. It shifts it up a little bit, so we now think our expected waiting time is slightly more than ten minutes, but not much more. One data point doesn't have a big impact here.