I promised you that today would be conceptual and I think I'm delivering on that promise. I also told you that unlike in accounting where I did things that were explicitly focused, in statistics I can't do that. I have to give you a background of why we are using what we are using. Having said that, what I'm covering is not that difficult to do. But I hope before you came here, like you did before you started measuring covariance and correlations, you went and looked at the note, reviewed the formulas, practiced some examples in your homework in assessments. And now are joining me for regression. Regression is used in every discipline that I know. My brother's a doctor, he uses it in his data analysis. The people in finance use it all the time. I almost feel like this was created for finance. And people in fields which never thought of using statistical analysis now do. Because it's everywhere. Data is everywhere and one of the reasons it's become quite simple to do this is because we have the horse power, the machine power to execute and by the way, the only caution about statistics is this. It is, because it's so powerful like finance it is also susceptible to misuse, so now I'm going to introduce you to a topic that can take an entire week. It's called regression. And I'm going to show you why and how it's used in finance later. But for now just use an example that you still would like or understand in the real world. Okay and a different one every time. So let's start. This time let me write out the regression equation on top. It is written in textbooks like this. Actually let me just remove this. Beta x i. This is what you'll see in every book. I've removed the subscript because it's not needed here. Okay. This is what a regression is. The first thing when you stare at it should strike you is this. Are there any squares in there? Are there any cubes in there? No. A regression the most commonly used version of it is linear. And linear relationships are the most simple to understand. And it turns out, and please remember what I'm saying now, it turns out that if x and y have a joint distribution, meaning they are phenomena, correlated phenomena, occurring jointly. If they're normal jointly the only relationship between them is linear. That's possible. Isn't that really profound? Like if you have a normal distribution the only measure of average behavior is the mean. What is the only measure of uncertainty of a single thing? Variance? Similarly, the only relationship that can happen between two things, is a linear relationship. This is so powerful, and therefore, underlies a lot of the way finance works. Of course, it's dependent on an assumption like everything else, okay? Okay, so this is linear. Now, let me draw the visual version of this. So what is this? This is y and this is x, and let's, for simplicity, call this advertising. And let's call this sales. I'm just taking different examples because I don't want you to think that we are doing this just for the heck of finance, no, no, no. Finance really relies on statistics. But statistics has much broader uses. So I want to use it, explain to you, regardless of your background. So supposing you are in sales, yet you're working for a company. Let me ask you this. Do you expect a relationship there? If you were sure there won't be any relationship, would it be good to spend money on advertising? Probably not. But if I asked you, who's the best advertisement for your service or your product, I hope you would say the service or the product itself. In fact that's why I love Apple. Apple basically doesn't believe in advertising because it believes the product it gives you is something you never wanted but now are dying to have. And therefore everything is built into the product. That's the genius anyways. Okay. So let's do y and x. What do you get? For each i you draw a dot and you get something like, this is called a scattered plot. Because that is what it looks like, scatter plot. I love these words. Statistics I love also because it says what it means. Right? There's no BS in statistics. And there's a lot of BS in life anyway. Okay. So, what is y? These are all the possible dots and what does this straight line trying to do? It's going to try and fit a relationship between the two. What do you notice is that there's an intersection point here. And there's a slope of this. So I have drawn this line and now let us talk about regression. What I won't do right now is tell you the formulas for things you can go figure it out. I'm here just to show you how regression measures start. So let me ask you the first question here, what the heck is alpha? Seriously, what is that alpha, that first after y naught equals alpha? Well, let's stick with the sales and advertising problem. Give you a second. You could say it's the intercept. It turns out you're right. It is the intercept. On which axis? The y-axis. But telling me that some interceptors like throwing jargon at me. Tell me in English. What did you, as soon as you said English you should have figured out what I am going to ask you. As soon as you said alpha the intercept. Remember I was asking you, just tell me in English what is alpha supposed to be. And alpha is that amount of sales that happens regardless of advertising. All right, and hopefully it's positive. Because if it's 0, negative sales is tough to visualise, but supposing it's 0, then you know, I mean you may have to work too hard to sell it. So, anyways, so alpha is that amount that requires none. Makes sense right, x is 0. Let me ask you this, what is beta over there. What is beta, what is this guy? Turns out that if you have measured sales and advertising in percentages or in changes, and I would recommend very strongly doing that, because there is a phenomenon called spurious or artificial relationship. Which happens when you measure things, levels, especially in economics. Why, because levels tend to increase just because of inflation. But that is not, that's obviously common to both, but it's not what you're after. So let's assume sales has changes, advertising has changes. And the good news in finance is we're after returns. Which are obviously changes. What does beta reflect? Beta reflects the slope. And by the way if you say slope in Excel it will do beta for you. So guess what it will ask you? As soon as you say equals beta it'll ask you what are your y's where are your x's. Are you with me? So beta says in this change context, if x changes by 1%, how much y? Changes by obviously. So if x changes by 1%, it's telling you the sensitivity of sales to advertising, in this example. So suppose beta is zero. What will you do? You stop advertising, or become much more creative in advertising. And you're hoping it's positive. And that's why you see the slope I drew was positive. So that's beta. So what is beta? Beta has to be measuring the relationship between x and y. And I promised you that I won't do it, but let me do it. I am going to rewrite just the equation. yi = alpha + beta xi + epsilon i. And I'm going to make them changes. Because that's the best way to measure things. What does a change in x do to a change in y? Turns out beta is equal to covariance of change in y, change in x, divided by variance of change in x. So, does the numerator look familiar? Sure. That's the basic sign of the relationship. The only difference between this and correlation is I'm standardizing by the variance of x. Instead of the standard deviations of the two. So beta, can be any number. In other words, it can be negative, positive, greater than one and so on. Whereas correlation was, had to be between minus one and plus one. So beta has another way of managing or measuring relationships. Let me ask you what this is. And don't tell me error. Because if you say error, you're talking to me in Japanese. I'm sorry, I mean. You know what I mean. You're talking to me in a language I don't know. Japanese is an awesome language. I meant, you're talking to me, and I'm not understanding what you're talking, but. So assuming this is the error from the line. This dot, this dot. This dot, this dot, this dot, and on average what should there be? Zero. Because if on average the arrows are not zero, there's something patterned there that I should either pick with x, or with some other variable, that would explain why. But the way that I say what epsilon is very simple. Epsilon is my level of ignorance about the relationship, about not the relationship, but what drives y. The more ignorant, the more the error will matter. So if advertising affects sales, then I can come up with other things that could measure the effectiveness or the changes in sales. My errors will become smaller or the variance in those errors will become smaller. And that's what the goal of a good model is but not randomly. So, let me ask you this, you will run the relationship with this being, this is the last point I'll make and then we'll take a break. Sales and advertising, right. The point I'm trying to make here is that one variable is on the right hand side and one variable is on the left hand side. What dictates which variable goes on the left, on the right? Theory, concept, thinking. You cannot put advertising on this side and sales on the righthand side. Most people will say, sales is determined by advertising more likely than advertising is determined by sales. So that kind of thinking has to go before you run regressions. So let's take a break here. What we do next is we'll spend 15 to 20 minutes trying to motivate with the data, the notion of portfolios. And show you some data and then next week what we'll do is, we'll get into the details of diversification and finally come up with a measure of risk. Let me ask you, what do you think it is called, given that based on relationships? What do you think the measure of risk in finance is called? It's called beta because in the end you'll see the only thing that matters is relationships and that's simply because we are risk averse and we don't put all our baskets in one thing. See you in a little while. Bye now.