Why is parameter error so serious? In order to understand this, let's walk through a very simple example. Suppose I have two identical assets with mean mu and co-variance sigma squared and correlation equal to 0. Then the optimal investment for these two assets would be to take half a position in asset one and half a position in asset two. That's what would give you the least volatility. Suppose now that the estimates for these returns are slightly off their true values. So I estimate the return on asset one to be slightly larger than the true value. So it's mu plus epsilon. I estimate the mean return on asset to be slightly smaller than the true value, mu minus epsilon. So on average, I'm making zero error. On average, the estimator is very good. So if you were thinking about the properties of a statistical estimator, you would say that whatever estimator is being used here is pretty good. Across the assets, we are not making a lot of error. But the problem with mean variance portfolio selection is that after estimate these parameters, I'm going to optimize my portfolio using these parameters. So what happens? I've estimated that the return on asset one is slightly larger than the return on asset two. And therefore, I will overweight asset one as compared to asset two. If I'm allowed short positions, then I'm going to short asset two and actually start investing more, take more leverage on asset one. But this is precisely the wrong thing to do. If I take the portfolio that I compute, which overrates asset one and underrates asset two, and put it into the market, I would get a return where the overrated asset is going to perform worse than expected. The realized return is going to be mu below mu plus epsilon. Second asset is going to have a return mu which is an epsilon larger than the expected return, which is mu minus epsilon. And this performance, this gap between the estimated performance and the realized performance, will become worse as more and more shorting is allowed. This is what accounts for the big difference between the estimated performance and the realized performance. The main difficulty is we take the estimated parameters and then optimize. And this optimization procedure inflates or maximizes the statistical errors in the parameters. There is a quote which sort of sums up the situation. Mean variance results in error maximizing investment irrelevant portfolios. So we have to do something in order to make mean variance portfolio selection practical. So one idea that might come out of looking at this slide is that the performance becomes worse as we allow more leverage. So perhaps the idea would be to limit short positions, not allow short positions at all. And then let's see what happens to the performance. In this slide, I'm plotting what happens to the estimated frontier, which is the green line, and the realized frontier, which is the red line when you have a no short sales constraint. And as you can see, the realized frontier becomes very unstable. This has a large part of the curve down here, which is actually inefficient. And the reason behind this is because the feasible region for the portfolios now has a corner. So if this is x 1, that is x2, you want x1 x2 to be greater than equal to 0, so you end up getting a corner in the feasible region. And this corner causes problems in portfolio selection. It causes instabilities in portfolio selections. As you add more constraints, maybe you have some asset sector constraints, maybe you have some constraints on how much money a particular sector can have, and so on. All of these become linear constraints. All of these induce more corners and more instabilities. If you want to get at what the no-short sales constraint was doing, which is to limit leverage, the better thing to do is directly put a constraint on leverage. And if you put a constraint on leverage, you end up getting performance shown in this curve. Now the realized performance of the portfolio is pretty close to the expected performance. The gap between these two is small. But the gap between what is expected and what was realized, this gap is still very large. So I expect to perform on the green line based on the data. I get the real realized performance is going to be the red line. Remember, this blue line is actually not known in practice. So even though I'm pretty close to the realized performance, the true performance and the realized performance are very close. I have no way of knowing how well I'm performing. So leverage constraints do work well in practice, but still the estimated frontier is very bad, and so there needs to be some work in trying to bring that down. The state of the art right now is something called robust portfolio selection. In the robust portfolio selection, what one does is remove the target constraint, which is imposed with respect to the estimated value of the mean, and replaces it by a target returned constraint, which is with respect to the worst possible mean in the confidence region. So let SM denote the confidence region for the mean. A few slides back I showed you that the confidence region is an ellipse. So instead of using a target return constraint which says that take the estimated value of the mu, transpose x, and insist that that should be greater than or equal to r, we're going to replace it by this constraint. And what does this constraint say? It says you choose your portfolio x, the return that you're going to get is going to be the worst possible return in the confidence region. Any point in the confidence region is possible. And therefore, this worst return is something that you could possibly see in the market. And now, instead of that constraint on the target return, I'm going to put a constraint that this minimum value must be greater than equal to r. I can do portfolio selection with this constraint. It's a little bit harder, but not much harder. And now the picture that I end up getting looks like the plot here. The estimated frontier starts coming down. Why does this happen? This happens because now I'm putting this worst case. So I have the estimated value of mu, this could be the estimated performance. But because now I put the worst case constraint, this gets dragged down. The realized performance also becomes better than expected. So that starts getting pulled up, and therefore the gap between these two starts to become very small. There are issues with this technology. You can sometimes not get portfolios which are not very interpretable. And therefore, it's having a little difficulty getting traction, but over time, technology either directly or some version of this technology is likely to become very practical. All of these methods were focused on trying to improve the optimization strategy. There is a flip side to this methodology where one tries to improve the estimation strategy. So here are some methods that people have used to improve parameter estimate. One of the most popular methods are so-called shrinkage methods. And what one does in these shrinkage methods is that one shrinks to some global quantity. These were introduced by Charles Stein in 1961, there's a paper by James and Stein. And more recently and Wolff have extended this to the case of covariance matrices and other circumstances. So let's take the case of the mean. Earlier, I would have estimated each of the asset means separately. So mu est i stands for the estimated mean for asset i. Now, in the shrinkage technology, instead of just estimating this asset mean, I'm also going to estimate a global mean. Global average mean on the assets. And there is a reason why I put this estimation outside of this bracket. And the reason for that is when I estimate this quantity, I don't simply take the estimate for all of the d assets and add them up. I assume that all d assets have the same expected mean. And use the data of all the assets to estimate that mean. As a result, I have more data when I'm estimating the total mean than when I'm estimating a given asset's mean. As a result, I expect that the error in this global mean is smaller. So error is smaller. And the errorp is larger in individual means. Now, this shrunk estimate, what it does is it takes the estimate for a particular asset, estimate for the global one, let's just call it mu bar, and it moves on this line. Some element alpha. When alpha is equal to 1, it's up here. When alpha is equal to 0, it's down here. For some intermediate value of alpha between 0 and 1, it's some point over here. This one has a very small error, that one has a bigger error. And when you shrink, you end up getting that the error at this point would be smaller. The tradeoff is as you decrease alpha and start coming closer to the global mean, you have less information about what the asset is going to do. But you have less statistical error. As you increase alpha, you have more information of what the asset is going to do. But you have more estimation error. So somewhere in between is the best thing. The next expression is a same kind of idea applied to the covariance matrix. So here, this shouldn't be estimated, but shrunk. This should be estimated down here. So we have a shrunk estimate for the covariance. All it does is takes the estimated value for the covariance matrix and shrinks it towards another covariance matrix where all the assets have the same volatility, or the same variance. Again, the idea is the same, that if I want to compute one variance for all the assets, I have a lot more data, I can estimate it better. And if I shrink the estimated covariance matrix towards this global covariance matrix, I end up getting a better estimate, meaning an estimate with lower errors. Another way to improve parameter estimates is to use subjective views. And the most popular way of doing that is the so called Black-Litterman method. Recently, people have been starting to use non-parametric nearest neighbour like methods to estimate performance. And this is because people have started going away from parametric models like mean variance and going towards more data-driven models. Anf the idea here is to observe the current return r, go back into the past, and find all those times t where the return is close to the current return. So this is the current return, this is the return at some point t in the past. You want to make sure that it's pretty close to the current return. And for all those times t, find out what happened at time t plus 1, and use that as sample of what is going to happen in the future. These nonparametric methods are currently at a very theoretical level, but there is a possibility that these methods will provide a better way of doing portfolio selection in the future.