Today we're going to introduce the autoregressive time series models. They are usually referred to as AR models. AR models are central to stationary time series data analysis. And as components of larger models are in suitably modified and generalized forms Under like non stationary time varying models. Let us consider the time series of equally spaced quantities, yt for t from 1, 2 up until capital T. The standard autoregressive model framework of order p, usually denoted as AR(p). Assumes yt arising from the model, yt equals to summation of j running from 1 to p, Phi j, yt minus j plus epsilon t. Here this Phi j are called AR coefficients. And epsilon t are known as innovations. This is a sequentially defined model. yt is generated as a function of past values, parameters, and errors. The epsilon t here are assumed to be conditionally independent of past values of the series. So firstly we have the innovations epsilon t independent of the past value of yt, so here tau is less than t. And secondly, the epsilon t's are also assumed to be independent and identically distributed from a normal distribution with mean zero and variance Mu. So epsilon t, rrd, of normal with mean zero and variance mu. Now we proceed to the likelihood of this model class. The sequential definition of the model and its Markovian nature imply a sequential structuring of the likelihood. So, for any T greater than p, we can write the likelihood as p, y of 1 to capital T giving the AR coefficients Phi 1 to Phi p and Mu, this is equal to p(y1 to p and p. Sorry this is product. Product of the t running from p plus 1 to capital T, p yt giving yt-1 to t -p, and also Phi 1 to find Phi p and the variance parameter Mu. So here are the first terms- This first part comes from the autoregressive structure of the model. The leading term is a joint density of the p initial values of the series. If we can view the first p values of the series as fixed constant, as suppose capital T here, equals to n + p for some n greater than 1. Since the likelihood for the n data y p+1 to y capital T given the first p value and also the AR coefficients and variance Mu, z can be written as p y. So here the time is starting from p + 1, to capital T, and this needs to giving the first p values, and also the AR coefficients, Phi 1 to Phi p, and Mu. This is equal to p product of t from p+1 to t, yt giving the values yt-1 to t- p, Phi 1, Phi p and Mu. So, then we need to take a look at the AR equations here. So because we have yt equals to the summation of j running from 1 to p, Phi j yt minus j plus epsilon t and epsilon t follows a normal distribution with mean zero and various Mu. So equivalently, we can write yt follows a normal distribution with mean. So the mean should be this value. So it's summation of j from 1 to p, Phi j yt minus j and also the variance is Mu. So every time if we write this summation, it might be too complicated. So let us define some notation to simplify this. So we will define a vector called ft, this equals to yt minus 1, yt minus 2 da da da, yt minus 1, sorry yt minus p transpose. And also define another vector Phi. This equals to Phi 1 dot dot dot Phi p, okay? So then this summation can be written as ft transpose times Phi. So we can equivalently write yt follows normal, this and Mu. So, we plug it here, we have this is equals to the product of t from p plus 1 to capital T, normal distribution of ft transpose times Phi and Mu. And another more simplified way to write it is to define a matrix f. This is just, we combining this small ft as a column vector of f. So we define f as yp yp-1 dot dot dot y1, and the second vector is yp+1, yp dot dot dot y2. And the last column of f should be y capital T-1, y capital T-2 to y capital T-p. So, here we have a product of universe normal distribution. We can also write it as N dimensional multivariate normal distribution. So it should be the normal distribution with mean, here we have this big F matrix transpose times Phi and the covariance matrix is Mu times the identity matrix. Because the N union version normal distribution are independent and identically distributed. Notice the similarity between the AR model and the multiple- Linear regression model. [SOUND] So suppose our own data y- p plus 1, to y capital T, this is denoted as a vector y. So this is belongs to IRn and using our matrix form we have y equals to F transpose times Phi plus epsilon. So here y is n times 1, the F matrix as we defined before, this is a n by p matrix and vector Phi, this is p by 1. And finally this epsilon is n by 1. So you can check this is a legal definition. And here our epsilon it follows a normal distribution, with mean zero and variance Mu times the identity matrix. So from here, we're kind of immediately get the likelihood for y. So here, the likelihood for y is y condition Phi 1 to Phi p and Mu, this is immediately we have, this equals to normal distribution with mean F transpose times Phi, and variance Mu times the identity matrix. This matrix form notation and similarity with multiple linear regression model allows us to directly use the result from partial linear regression models to choose priors and perform posterior inference. To complete the model formulation, we use a conditional conjugate prior for the model parameters Phi and Mu. The conditional contrary prior for Phi and Mu is p Phi and Mu because it's conditional conjugated. So we first use condition, so Phi condition Mu times p Mu. And for this part we will use a normal distribution. So this should be normal of Phi with mean m0 and variance Mu times c0. And for the prior of Mu, we use the inverse gamma distribution. So we have inverse gamma of Mu with parameter n0 over 2, and d0 over 2. Here m0, c0, n0 and d0 are all hyper parameters. Now we complete the model formulation of autoregressive models.