>> We're now going to review multivariate distributions. We'll talk about multivariate CDFs, multivariate PDFs, conditional distributions, and so on. Much of this material is a little tedious, a little dull, but we thought it was worthwhile collecting it all and having it in one place for your review if it crops up later in the course. Okay, so let's get started. Let x be a vector of random variables, x 1 up to x n. We say the joint CDF of x is given to us by the following. So, the joint CDF, Fx of little x is equal to the probability that X1 is less than or equal to little x1. X2 is less than or equal to little x2. Up to xn being less than or equal to little x n. And from this joint CDF, we can actually calculate the marginal CDF, so for example, the marginal CDF of Xi is given to us by just plugging infinity into all of the components in the joint CDF except for the ith component which is little xi. Okay, so we can go from the joint CDF to the marginal CDF. It is also straightforward to generalize the previous definition to joint marginal distributions. So for example, if I want the joint CDF of just xi and xj, I can also recover that from the joint CDF of 1 up to xn by placing infinity in all of the arguments, except for the ith argument where I have xi and the jth, where I have xj. We also say that x is a joint PDF or probability density function, f subscript x, if we can write the joint CDF as an integral like this. So, this is just the way we, we, we capture our joint CDF by integrating out the density function by appropriate limits. Okay. We can also talk about conditional CDFs. So what we're going to do is we're going to partition our vector x1 up to xn into two components. The first component is x1, which contains x1 up to xk. And the second component is this boldface x2 which contains xk plus 1 up to xn. And then we can talk about the conditional CDF of x2 given x1, and in fact, it's defined as following, as follows. So the conditional CDF of x2 given x1 is equal to the probability that the random vector x2 is less than or equal to little x2 conditional on x1 being equal to little x1. If x is a PDF, f of x, then the conditional PDF of xx2 is given to us by this quantity here. So it's the joint PDF divided by the marginal PDF of x1, which we can also write like this. Okay. And the conditional CDF, f of x2 given x1, can be determined by integrating the conditional PDF. So this is our conditional PDF, and we can integrate this out with respect to uk1 up to un and that will give us our conditional CDF. Okay, independence. We say the collection x is independent if the joint CDF can be factored into the product of marginal CDFs. So in particular the joint CDF here in the left hand side is equal to the product of the marginal PDFs over here on the right hand side. Similarly, actually, this implies that if x is a PDF fx, then we can also factorize the joint PDF into the product of the marginal PDFs over here. We can also see from one, and one is here on the previous slide. So we can use this, okay, to see that if x1 and x2 are independent, then the conditional PDF of x2 given x1, well by 1, that's equal to this ratio here. So the joint PDF of x divided by the margin PDF of x 1 and by independence here, we can replace the joint PDF by the product. Then these two cancel and we're left with the marginal PDF of x2. So what we're saying here is that if x1 and x2 are independent then the conditional PDF of x2 given x1 is simply the marginal PDF, f of x2. In other words, having information about x1 tells you nothing about x2 when x1 and x2 are independent. Okay. Some implications of independence. Well,and I expect we're all familiar with this, but let's, let's go through it anyway. Let X and Y be independent random variables. Then for any events A and B, the probability that X is in A and Y is in B, well, that factorize into the product of the probability of X being an A times the probability of Y being in B. More generally, for any functions, f and g, independence of X and Y implies the expected value of f of X times g of Y is equal to the expected value of f of X times the expected value of g of Y. And in fact, 2 follows from 3, okay? So the implication goes that way and it's easy to see this, because we can write this probability of X being in A and Y being in B as the expected value of the indicator function of X and A times the indicator function of Y and B. Just to remind ourselves what is this indicator function, while it takes on two possible values, it takes on the value 1 if X is in A and it takes on the value 0 otherwise. So therefore, the product of these two indicator functions is 1 or 0 and will only be 1 if X is in A and Y is in B. Okay? That occurs with probability X and A, and Y and B. So this statement here is correct. Okay, so we've got this first line. And now we can use the independents X and Y in condition three to break down this expectation down into the product of these two seperate expectations. Okay. But of course, this expectation is the probability that X is in A and this expectation is the probability that Y is in B. So indeed we do see that we can go from three to two. Okay. More generally, if X1 up to Xn are independent random variables, then we can write the expected value of f1 of X1, f2 of X2, and so on up to fn of Xn. That factorizes into a product of n separate expectations. The expected value of f1 of X1 times the expected value of f2 of X2 and so on. Random variables can also be conditionally independent. For example, we say that X and Y are conditionally independent given Z, if the expected value of f of X times g of Y given Z is equal to the expected value of f of X given Z times the expected value of g of Y given Z and I should mention this is for all functions f and g. Okay, and in fact, this idea of conditioned independence, we're going to see later in the course, because it's used in the, well, the now infamous Gaussian copula model for pricing CDOs. So just to give you a brief idea of how it might be used in a bond context or a CDO context, let Di be the event that the ith bond in a portfolio defaults, okay? So we'll assume that there is a portfolio of n bonds. Okay. It's not reasonable to assume that the Di's are independent. You might ask, why is that? Well, if you think about it, there will be all sorts of macroeconomic factors or industry specific factors, which will cause defaults to actually be dependent. So for example, maybe some industry crashes that might cause not just one firm to default but multiple firms in that industry to default. And so, it doesn't make sense to assume that these events, these Di's are independent. But, we might be able to say that they're conditionally independent given some other random variable zed. Zed, for example, might reflect some industry factor. Some, some factor that governs how well a particular industry is doing. In that case, if we assume that the default events are conditionally independent given zed, then we can write the probability of D1 up to Dn given Z as being the product of these factors here, probability of D1 given Z up to probability of Dn given Z. And it's actually often easy to compute these quantities. So we'll actually be using this kind of idea later in the course, as I said, when we discuss the Gaussian copula model for pricing CDOs. We'll also see it in a couple of other applications as well. Okay, so very briefly, I also want to mention the mean vector and covariance matrix of a vector round the variables X. I hope we're all familiar with this already, but let's go through it anyway. So the mean vector of X is simply the vector of expected values, expected value of X1 up to expected value of Xn and the covariance matrix of x is. Well, this matrix of covariances. Okay, so, formula is expected value of X minus expected value of X times X minus expected value of X transposed. And just to be clear, this is an n by 1 vector, and this is a 1 by n vector, so the product is n by n. And we get an n by n covariance matrix, with the i, jth element of sigma being the covariance of Xi and Xj. The covariance matrix is symmetric that of course is because the covariance of Xi, Xj is equal to the covariance of Xj and Xi. And this diagonal element satisfies sigma i greater or equal to 0, and of course, the diagonal elements are just the variances. So this is equal to the variance of Xi and variances are always nonnegative. It is also positive semi-definite, so this is a, an important well-known property of a covariance matrix, in particular, it means that X transpose sigma X is greater than or equal to 0 for all vectors X and Rn. The correlation matrix row X is similar to the covariance matrix except it has as its i, jth element, the correlation Xi with Xj itself is symmetric, positive semi-definite, and has 1's along the diagonal. And just to remind ourselves, the correlation of Xi and Xj, is equal to the covariance of Xi and Xj, divided by, well, the square root of the variance of Xi times the variance of Xj. Okay. For any matrix A, which is a k by n matrix and a k by 1 vector A, we can take a linear combination of AX plus little a and we can compute the mean of this vector. So the mean is a times expected value of X plus little a and the covariance matrix of this new vector of random variables is a times the covariance of X times A transpose. And of course, five actually implies this result, which you're probably familiar with, that is the variance of aX plus bY equals a squared variance of X plus b squared variance of Y plus 2ab the covariance of X, Y. Note that if X and Y are independent, then the covariance of X, Y equals 0, but the converse is not true in general. And some people tend to forget this, but it is not in general true that if the covariance of two random variables equals zero, then those two random variables are Independent. That is not true.