Welcome to Module 2. We are going to have a discussion on the use of factor models, and in particular how factor models can be used in a portfolio construction context, which is a place where they have been found most useful. So first of all, we are going to think about if you don't have a factor model in the first place, how would you go about estimating risk and return parameters that you need for constructing portfolios? Well, what you would do would be to try and take a look at sample-based estimates for expected return for variance, for covariances or correlations as well. You can use Python, or even Excel if you will to do these very straightforward statistical analysis. But the problem here is that, the sample-based estimates that you're going to get, will very likely sure a profound lack of pro-business. l'm going to explain why? But it's not going to be the case that these parameters, I mean these parameters are going to be very sample dependent typically for different reasons. In particular, expected return, non estimates, sample-based estimates for expected returns are known to be very noisy. So we are trying to improve on that situation by using typically a factor model. So the way we do that is well, let's take for example the simple one factor CAPM, Capital Asset Pricing Model, as an example of a factor model that we can use to generate more robust expected return parameter estimate. So what you do in a step one, yes, you are going to run this market model regression, right. So you're going to try and estimate what's the Alpha and the Beta of your model. In other words, you are going to try and estimate what's the exposure of the stock with respect to the underlying market return and also what's the abnormal return which we call Alpha i in this case. So by doing so, you're going to come up with parameters estimates, Alpha i hat which is the parameter, which is your best estimate, your sample-based estimate for Alpha and Beta i hats which is your best estimate for the exposure of the security with respect to the market. Then you are going to be using the factor model by imposing some structure to the problem and taking Alpha i equal to zero. So in other words, even if the data tell you that the stock may be mispriced and there was a positive or negative Alpha, you are going to impose some structure, and say well, let me take the Alpha to zero, which would be the case if the CAPM was the true asset pricing model. Then what you would be doing is you take as a proxy for the expected return or the excess expected return for that stock, you're going to take Beta i hat which is your estimate of market exposure with respect to some proxy for the market risk premium. That's the typical way that you go for using a factor model single factor model in this case to estimate expected returns. Now, the beauty is that the analysis can be extended in a very straightforward manner, to a multi-factor setting. Remember that we argue that even though we don't know the true asset pricing model is, we have reasons to believe that it's not a single factor model. We have reason to believe that multiple factors can be useful at explaining differences in expected return. Well, that's precisely what you get by these arbitrage pricing line. It's actually an hyperplane, which is a natural generalization of the security market line that we just talked about. In these case, we are trying to explain excess return on a given stock by differences in exposure with respect to different factors. So what we are saying, is there's this linear decomposition which lies at the heart of the arbitrage pricing theory developed by Steve Ross, in 1976, which says that, "based on very mild assumptions related to the absence of arbitrage, you can decompose the excess return on security in terms of the sum of Beta i k, Lambda k. Beta i k being the exposure of security i with respect to factor k, Lambda k being that is premium associated with factor k". In other words, it's a very parsimonious decomposition of excess return, and as long as you're comfortable in estimating the Betas, which is an important question for which by the way machine learning will provide very useful insights in terms of how to improve those estimates for Beta. You come up with a proxy for the risk premium on the factors and then you can back out whats a reasonable estimate for the risk premium on securities. Now, you can do the same analogies not only to estimate expected returns but also to estimate risk parameters. We have reasons to believe that risk parameters are somewhat more robust in terms of parameter estimations when it comes to using data. So in particular for risk parameters, the recommendation would be for you to use higher frequency data to try and use for a given sample period as many data points as possible, to improve the quality of your partner estimate. That technique of increasing the frequency of the Beta happens just not to be very effective in case of expected return because it can be shown for expected returns that regardless of the frequency of the data, you end up getting the same parameter estimates which essentially depends on the first and the last data point in the sample. Which is essentially why sample-based estimate of expected returns are very noisy hence to justification for using a factor model. Now, in the case of permanent estimates, if we had few parameters to estimate we actually would be in a good shape and we may not even need a factor model to do the job, simply because as we said, these estimates can be made sharp and pretty robust. Now, the problem though is that there's an increasing, I mean the increases in the number of parameter estimates as we increase the number of assets in the universe, it's very fast, it increases as like n squared divided by two when n years, the number of assets in universe. So to look at an example, if you have 100 assets then, you end up having to estimate something like 5,000 coalition parameters. That's a lot. There's a lot of parameters to estimate, and we call these problem the curse of dimensionality. Well, that's where a factor model comes handy because the factor model will precisely allow you to reduce the dimensionality of the problem and as opposed to having to estimate the covariance or the correlation between each pair of individual assets. We're going to try and back out where this covariances between any pair of two assets would be as a function of their common exposure with respect to the underlying sets of systematic factors. So as you can see from this decomposition, we can not only decomposed the variance of any asset in terms of loadings and factor variance but we can also decompose the covariance terms. What we find in the end at the cost of coming up with an assumption, which is again imposing some structure to the problem, we are going to assume that whatever remains a specific return for asset i and asset j, we are going to assume that these specific returns are uncorrelated. Which of course would be the case if we have done a proper job at extracting away the commonalities between asset returns then, by construction whatever is left would be uncorrelated. While in reality we work with imperfect factor models, so there may be some, commonalities left where we're going to assume that away, precisely in an attempt to impose more structure on the problem. Well, if you do these as the equations suggests then, for you to get an estimate for covariance between Sigma i and Sigma j, as opposed to have a lot of n-squared type terms to estimates, you only have to estimate for each individual stock the exposure Beta i, k with respect to all factors Beta i, k with exposure with respect to factor k or k equal one to capital K. In other words, as long as the number of factors is small compared to the number of securities, then, this is an extremely powerful approach for reducing the curse of dimensionality because you're eventually left with n times k parameters to estimate as opposed to n squared. Again, when k is smaller much smaller than n well, that's a big improvement.