[SOUND] Let's talk about multivariate covariances. So, these are a little less, these are often less described than multivariate variances, but I find them very useful, so I'm going to just go over their properties really quick. If I have a vector, x and another vector x, then the co-variance between x and y is going to be defined as the expectant value of x minus its mean. Let’s call i mu of x, out of product with y minus its mean let’s say mu of y transpose. So notice first off the multivariate covariance is not symmetric, so covariance x, y is not necessarily equal to covariance y, x. We also note that if we plug in y equal to x, we get the variance, so covariance x, x is going to be equal to the variance of x. It also has a shortcut formula just like univariate covariants calculations have a shortcut formula. And that's going to be the expected value of the outer product of x and y, minus the outer product of the expected value of x and the expected value of y. And I think given the way in which we derive the shortcut formula for the variants, you should be able to now, at this point, derive the shortcut formula for the covariance. So the covariance has some nice properties. And perhaps the most useful one, is that if we take the covariance of (Ax, By), where A and B are constant matrices. Then that's going to be A times the covariance of x and y, B transpose. So the left hand arguments gets pulled out to the left side. And the right hand one gets pulled out to the right side, but then gets transposed. The second thing is that covariance of x + y and z. Let's say, suppose we have three random variables, covariance of x + y and z, = the co-variance of x and z, + the covariance of y and z. And similarly the covariance of x and y + z is going to be the covariance of x + y. Covariance of x and y + the covariance of x and z. So we can also look at formulas that are useful such as the variance of a sum. Now we can write it out correctly, if we want to know what the variance of x + y is, and I suggest that you prove this yourself just using the collection of rules that we've given you so far. The variance of x+y = the variance of x + the variance of y + the covariance of x,y + the covariance of y,x. And notice in this case the dimensions work out because we are assuming that x and y are both end by one so addition is meaningful. So that covariance x,y and covariance y,x have the same dimension, its not guaranteed. In general, it will often be the case that the covariance is not a square matrix if y, for example, has a different dimension then x. But in this case we are assuming it does because we are assuming that x + y is meaningful. So anyway, you get this nice covariance formula. So, if x and y are uncorrelated then these two terms would be zero or mutually uncorrelated the vector x are mutual and correlate with vector y. Then those two terms would be 0. And then the variance will distribute across sums. Just like in the case if we have uncorrelated random variables then the variance will distribute across sums, okay. So I think that's pretty much all you need to know. One more extremely useful fact. Often, and this may seem strange, but it's often the case that A times y and B times y, two random vectors that are functions of the same, kind of originating vector y. Right, the covariance of those can be zero, right? So, this is the co, this is A time the covariance of yand itself, B transpose, and let's just assume that the covariance of y and itself is this multivariance covariance matrix sigma. So what we can see is that Ay is going to be uncorrelated with By if and only if this quantity is exactly zero, okay? And that's a tremendously useful fact that we will use quite often. So in the next couple part of this section we're going to talk about quadratic forms and how we can calculate moments in those cases. And we'll talk about a way in which we can prove an optimality result now that we have multi-variant expected values. If we make some assumptions about our response and our predictors we can start to add statistical properties to our least squares estimate, which up to this point we've only discussed their mathematical properties.