Welcome back. Today we are going to talk about the curse of dimensionality in estimating portfolio parameters risk, parameters for individual securities, then we are going to try and explain how we can handle this curse of dimensionality. What do we mean by the curse of dimensionality? Well, it's actually pretty simple. The number of parameters that we have to estimate tend to increase dramatically as we increase the number of constituents. So if we call N, the number of constituents, well then it turns out that you have to estimate N expected returns. But you have to estimate N times N minus one divided by two correlation parameters. In terms of volatilities, the situation is not too bad. You need also to estimate N volatilities each one for each component of your portfolio. So clearly, the problem lies with the correlation parameters. Let me give you an example. If you look at the universe with a 100 stars for example, then you are going to have to estimate a 100 expected return parameters, a 100 volatility parameters. Which is quite a lot, but that's okay, compared to having to estimate 5,000, about 5,000 correlation parameters. That's exactly what we mean by the curse of dimensionality. The number of coalition parameters increases as the squared number of constituent divided by two, and that increases very fast. Now this is a very big problem, because to estimate so many parameters, we would need to have access to extremely large sample size and that's not exactly the case. Even though we do have at our disposal, high-frequency data, so something like looking at daily, weekly, or monthly data on stock returns for example. So it sounds like there's a lot of data that we can use. But when you think about it, if you're looking at 250 daily returns per year, you're trying looking at popping days for equity markets. Well, that means that if we use 10 years worth of data, we get something like 2,500 sample points. Well, it's a bit difficult, because in this example we have to estimate 5,000 parameters. So the number of parameters that we have to estimate is twice as big as the sample size. Remember that we're looking at a pretty high frequency daily, in a pretty long history which is 10 years. So clearly, the conclusion is we must find a way to deal with this by trying to reduce the number of parameters to estimate, because if we don't, then the each one of these parameter estimates will be pretty noisy, not very meaningful. So that's the real challenge. Now, how would you reduce the number of parameters to estimate? Well, there's an easy way that just says you just need to reduce the number of constituents in your portfolio. But that's not a real way out of the problem, because the number of constituents in your portfolio is given by the context, is given by the benchmark, is given by your client. So typically, you don't have much of a choice, and say okay I would like to optimize a 100 stocks portfolio, but I can't for technical reasons. So I'm going to pick 10 stocks and optimize the 10 stock portfolio. Well, that's not a reasonable approach. So today, we are precisely going to discuss how we can in a meaningful way reduce the number of parameters. So back up, what can we do first of all increasing sample size as much as possible, which means increasing frequency and increasing time history. Then that we're going to have to do this to some extent, but there's going to be a limit to how much of this we can do. So at some point we'll have to reduce the number of parameters, and now this is going to happen through the introduction of some structure. We are going to impose some structure on the data as you will see. Well, first of all let me introduce the most basic, the simplest estimator for covariance parameters. We call that the sample estimate, historical estimate of covariance parameters. Well, what this equation shows is very simple. Essentially if you need to estimate the covariance between stock i and stock j. You take a sample and you just measure the sample covariance, right. Which is given by the average of the products return on stock i minus the mean return on that stock, times return on stock j minus the mean return on stock j, and then we take the average over the sample period of those gross products. That's exactly the definition for covariance. From that, you can divide by your best estimate for volatility, which is just looking at sample volatility and then you're going to get a correlation estimates if you need it. So we can do this, but then we have to estimate 5,000 parameters in the example where the number of constituents was a 100. So how can we reduce the number of parameters? Well, there's a pretty effective way of doing this, and very dramatic reduction of the number of parameters, which is known as the constant correlation model. That constant correlation model, allows you to reduce dramatically sample risk by reducing the number of parameters. So how does it work? Well, what you're going to do is as opposed to looking at your covariant matrix, where the general entry term is Sigma i Sigma j times Rho ij. You are going to impose the assumption of the structure, that all of these correlation parameters are identical. So the general entry of your matrix is now going to be written as, Sigma i Sigma j times Rho, where Rho is supposedly common value for all of the correlation parameters. Now of course you know that this is not true, you know that the real population value for these correlations parameters are not all identical. The reason why you force them to be identical is because you want to impose some structure and reduce the number of parameters to estimate. So to make a long story short, the trade off, is you're hoping by doing this, that it is better to estimate the single parameter accurately, as opposed to trying and estimating many parameters each one of them being poorly estimated. So what you're gonna do is, you're going to take as an estimator Sigma ij hat, your estimator for covariance, Sigma i hat which is your estimator for stock i volatility times Sigma j hat, which is your estimator for stock j volatility times Rho ij sorry, times Rho hat, where Rho hat is precisely the best estimate for this unique common parameter. Now, how do you come up with an estimate for Rho hat? Well, this equation will tell you something pretty intuitive. The best way to estimate this Rho hat, is to take a look at each sample-based estimate for Rho ij, let's call them Rho hat ij, and then you take the average of all of them, for all pairs of stocks in your sample and that's exactly will give you the Rho hat. So in other words, if the sample estimate for some pair of stocks correlation tells you it's 0.9. Well, that looks suspiciously high, and other estimate might be minus say 0.7. That looks really low, you're saying, how come two stocks have such big negative correlation, while they are impacted by the same phenotype of market environment. That's a bit suspicious. So your concern tends to be related to the fact that, you know that's probably because of sample risk of noise that one of these estimates is too high, the other one is too low. So what you're going to do is, you're going to shrink the dispersion of these estimates by again forcing imposing that they all identical, and you take the average and the average will turn out to be a pretty meaningful number. Now, you might wonder such a traumatic simplification of the problem sounds little extreme, and it sounds like you're not going to be doing a good job by throwing away so much information by only looking at the average correlation and imposing that each correlation is equal to the average. Well, actually, there's a paper written in 1973 by Elton and Gruber. Two professors at NYU. In that paper what they've shown is that actually when you are using the constant correlation-based estimate for the covariance matrix out of sample, you get more meaningful portfolios. So they are looking at minimum variance portfolios which they build on the basis of either the constant correlation estimate, or the sample-based estimate, and what they found is the estimate, the out-of-sample estimate for the minimum variance portfolio constructed using the constant correlation estimate was better than the one using the sample estimate. So indeed the trade-off was worth it reducing sample risk, even though it came at the cost of huge amount of model risk that eventually kind of paid off. Wrap-up. Well, there's a big concern that if you're looking at a high number of stocks in your portfolio, then there will be a large amount of sample risk in each one of these parameter. Estimates, what you want to be doing is increasing frequency as much as possible, increasing time period as far as it stays reasonable. But that's not going to be sufficient in terms of increasing the sample size. At some point you're going to have to reduce the number of parameters and the constant correlation is a pretty effective way to be doing it. It is fairly extreme admittedly. So next time we're going to be looking at other methodologies for reducing the number of parameters, and what I'm going to try and see whether they are allowing us to improve over this constant correlation parameter estimates which are already an improvement over the sample, sample-based method.