Can I start from the beginning? Good. Three, two. In this class, we will discuss the Gibbs sampler for obtaining posterior samples of model parameters as well as obtaining in-sample posterior predictions. Last time, we have derived for the mixture model, the full posterior distribution of model parameters have these huge form. We will start from here to find the full conditional distributions for each parameter. Let us start with the weight vector Omega. The full posterior distribution of Omega conditioning [inaudible] dot-dot-dot, we will use this three dot to represent the cracked conditioning. Remember, to calculate full conditional distributions, we will select out from this full posterior distribution the specific terms that contains the variable of interest. Trying to recognize those terms as forming a Chernoff of a family that we know and can recognize. For the Omega vector here, we have these two terms containing Omega. Here, this is proportional to the product of k running from 1 to capital K, Omega k. The summation of indicators t from p plus 1 to capital T, indicator L_t equals to k. Also the product of k from 1 to capital K is Omega k, a_k minus 1. I'll briefly recount combining terms here. This is proportional to the product of k from 1 to capital K Omega k summation of t equals to p plus 1 to capital T, the indicator L_t equals to k plus a_k minus 1. This is a form of the Chernoff Dirichlet distribution. The full conditional distribution of Omega is just a Dirichlet distribution. With parameters, first one is summation of t from p plus 1 to T. The indicator L_t equals to 1 plus a_1. The last term is the summation of the indicators t from p plus 1 to T. The indicator of L_t equals to K plus a_K. Then, let's proceed to derive the full conditional distribution for the configuration variable L_t. That is, we want the probability math that L_t equals to K, conditional all the others, and this is proportional too. Now let's take a careful look of this joint posterior distribution. If we look carefully, there are only two terms related to L_t. One is from this first product. This is a product of many observations, but only the term for y_t contains L_t, so we have a normal density of y_t giving parameters f_t transpose Beta K and Mu. Another term is containing this product. Again, this product containing many observations, but only the one with the observation t that is related to this probability, so we have Omega K here. In practice, conditioning Omega Beta Mu, we can calculate this product explicitly for k from 1-K. Therefore, the full conditional distribution for L_t is just a discrete distribution on parameters on 1 up to K. The probability that L_t taking K condition on the others is just this Omega K normal density y_t of f_t transpose Beta K and Mu, over the normalizing constant which is summation of k from 1-K Omega K normal y_t f_t transpose Beta K and Mu. Then, let's look now at the full posterior distribution of Mu giving the rest, so P Mu giving all the other parameters. There are three terms containing Mu in this joint posterior distribution. This is proportional to, let's write this three terms here. Product of k from 1-K, product of tL_t equals to k is normal y_t conditional f_t transpose Beta K and Mu. Also this term here, product of k from 1-K normal Beta K conditional m_0 and Mu times C_0, and finally, the inverse Gamma distribution times inverse Gamma Mu with parameter n_0/2 and d_0/2. Now let's write this distribution explicitly. This is proportional to, first here is a product of universe normal distribution. They are independent and there are in total n of them. Remember, L is equal to T minus P, so first after removing the constant, we have Mu^negative T minus P/2, and exponential. Remember folder exponential, this product becomes the summation, so it's negative two Nu. On the numerator we have sum of k from one to k. Sorry, sum of t for L_t equals to k and y_t minus f_t transpose Beta_k squared. Is this okay? I have mess up here. Could you go back and redo that part. Yeah. This two part is okay. Can I just redo this part from this three here? This is a long lecture. [inaudible]. Yeah. Do you know where we are? Yeah. Please go a little bit forward, last loop now. That's it, that's where we are. Okay, good. All right. Now, let's look at the full posterior distribution of Nu given all the others. P Nu conditioning on dot, dot, dot. This term contains Nu, this term contains Nu, and this term contains Nu. We first move this three terms down here. It's product of k from 1-k, then product of t that L_t equals to k. This normal density y_t conditioning our f_t transpose Beta_k and Nu, then this term times the product of k from 1-k, normal Beta_k with m_0 and where it sit Nu times c_0. Finally, the inverse Gamma distribution of Nu, this parameter n_0 over 2 and t_0 over 2. Now, if we write this product expressively, this is proportional to. First, these product is a product of independent normal distributions. Remember, L is equal to T minus P. We have Nu to the power of negative T minus P over 2 and exponential. For exponential, this product becomes summations. This is negative 2Nu. We have sum of k from one to K, then sum of t such that L_t equals to k and Y_t minus F_t transpose Beta k, this whole thing is squared. This part is from a normal density. Then we need to times for this part. Beta k is of p dimensional. We first have Nu to the negative P/2. Because they are k of them, so we need to times a capital K here, and then for the exponential part, the product becomes summation, so it's still negative 2 Nu, the summation of k from one to capital K. It's the density for a multivariate normal distribution, so it's Beta k minus m_0 transpose say zero inverse Beta k minus m_0. Finally, just times this inverse gamma density, which is Nu to the power of negative n_0 or 2 minus 1 and exponential negative 2 Nu and d_0. Now we just need two combining terms. Firstly, for Nu to the total power of, so here we have t minus p plus p times K plus n_0 minus one. Then for the exponential term, let me write it here, exponential negative. The denominator is 2 Nu, and for the numerator, we have first this sum, summation of k from one to K, the summation of observations t such that L_t equals to k. This Y_t minus F_t transpose Beta k square. Then this sum here, summation of k from one to K. This Beta k minus m_0 transpose, say zero inverse beta k minus 0. Finally, this t_0 here. Now we can recognize this proportionate function as a Chernoff of an inverse gamma distribution. In conclusion, we have the full posterior distribution of Nu, this is a inverse gamma distribution with parameter n star or two and d star or two. Here, n star is just the term here. It's t minus p plus p times k plus n_0. The t star, it's just this huge sum over two Nu. This is summation of k from one to K, the summation of observations t such that L_t equals to k. This Y_t minus F_t transpose Beta k, this whole thing squared plus the summation of k from one to K. This Beta k minus 0 times c_0 inverse, times beta k minus m_0, n plus d_0. This will be the full conditional distribution for Nu. Finally, for Beta k, we have this full conditional distribution, P Beta k, conditioning on all the others. Look at this form here, we have this term containing Beta k and this term continent Beta k. This is proportional to the product of t such that L_t is equal to k. This normal density y_t conditioning f_T transpose Beta k and Mu. Also here, this one part containing Beta k. This is a normal Beta k conditioning our m_0 and Mu times C_0. We can wheels this as a single component ARP model using only the part of data that satisfies this L_T equals to k criteria. Therefore, if we denote y tilde dot k as though y_t, such that L_t equals to k, and the F_k tilde, so corresponding design metrics for y_k, and also n_k, which is the sum of t from p plus 1 to T. This indicator L_t equals to k. This n_k is just the number of component in this y_k tilde. Then just use the conclusion about the ARP model we have learned in module 1. We immediately say have the full conditional distribution for Beta k. This is a normal distribution with mean, m and variance, c. Here, m is equals to m_0 plus C_0 times F_k tilde multiplied by F_k tilde transpose C_0 F_k tilde plus the identity matrix over n, this whole thing inverse times this y_k tilde minus F_k tilde transpose times m_0. The C is equals to C_0 minus C_0 times F_k tilde and the product of F_k tilde transpose C_0 F_k tilde plus the identity matrix, this whole thing inverse times F_k tilde transpose C_0. After we obtain posterior samples of model parameters, it is obvious to get ensemble posterior predictions. Denote as t posterior sample of Omega, Beta, and Mu as this Omega s, Beta s, and Mu s. If you want to get posterior samples of y_t. The s t posterior sample of y_t for t greater than or equal to p plus 1 and less than or equal to T. We first need to sample a configuration variable L_t^s from a discrete distribution with probability Omega 1^s up to Omega K^s. Then given this configuration variable, we sample y_t^s from the normal distribution with mean f_t transpose Beta L_t^s ^s and the y's Mu^s.