Statistics needs to understand us and our attitude toward risk. So, very quickly, repeating something, we like more return but we dislike risk. We therefore, will not put all our eggs in one basket. We will hold portfolios. We will diversify. We'll keep coming back to this. This simple fact is underlying all this statistics though and everything that follows. And by the way, this whole setup, this simple risk return relationship dictated by this phenomena. Our attitude towards risk underlines all the profound work done by finance people for the last 50 years. And at least two Nobel Prizes have to do with this. What we are going to talk about for the rest of this week and the following week? Now, for some statistics. And I'm going to use a pen and paper to write a lot. And by that I mean, an electronic pen and electronic paper. So, let me start off with the following. I want to draw a graph. And I'm going to draw a distribution,.ok? And I'm going to call this point something. It'll be a central point of central tendency. Then I'll try to characterize this behavior, departures from here, and then I'll try to do something else called, How Do You Measure Things? And, Relationships Between Things. That's the goal, but starting off there's a distribution. Can somebody tell me what does this distribution look like? I've been pretty cool about the drawing. It's called a normal distribution. And we are going to largely stick with normal distributions, because a lot of things. Start off looking on normal, very strange behavior. But when you look at the distribution level, lot of numbers, a lot of phenomena. They tend to converge to normal. Let's try this is gotta normal. But the most important thing about this is this. On this axis I've got probabilities. And on this axis I'll say I have got the phenomena and I'll call the phenomena y. Now very quickly, what could this be about? And if it's okay with you, I do not want to teach statistics in a dry fashion. You have chapters from books in finance that I'm asking you to look at. And you also have books on statistics that I am sure you're aware of or can google. But I want to tell you again is the essence of it and today is a little bit dry. But I want you to practice whatever we are doing, okay? I keep repeating that. So, what is y? Think of y as anything. And I'm going to call it y i. So, think of y as a distribution of heights on all the people taking this class. Do you agree that it'll be distributed all over the place, right? Hopefully nobody has a negative height. So, we are not going in that direction. So, I'm going to take height as an example. So, you have a distribution. Not everybody's exactly five feet tall, right? If it were, what would this height be? Remember this is probability. This height would be what? One. And there will be no tails, no distribution. All of this would collapse into this one height. And everybody's height if it were exactly five feet, we wouldn't need to worry about statistics. Turns out real world is not like that. Distributions are around. Some normal behavior, and look normal. We are going to zoom that largely, for finance. Okay. So, this is basically reflecting the fact, that I do not know something for sure. Going back to our example of a government bond giving me a return of 3%. Then the properties one simple because I know, that even the real world can be bad or good, these possibilities have been knocked out. Okay? So, that's the notion of the distribution. I'm now going to talk about a few characteristics of this distribution which may be very familiar for people who have a statistical background. But not familiar for others, okay? So, let's stick with our problem and let's suppose I know the distribution of possible heights. I want to calculate what is the normal? Imagine if in my head, I had to keep the heights of all people taking all classes in the university. It would be mind boggling. So what do we do? A distribution characterizes all possibilities. But then I ask myself, what is the average height, right? And this, we called many times expectation, and if the distribution is normal we can only worry about mean. Turns out the beauty of phenomenal distribution is if I divide it over If I carry this over. It will look like one perfect line. Right? It's very symmetric. So, the mean is right in the middle. And the mean will also be equal to mode and median. I'm getting a little geeky now. These are two other ways of measuring what is called, central [INAUDIBLE]. So, why am I interested in what's happening on average? Because that's what most people think about the future. Hey, on average, what will be my cash flow next year? 100, but will it be exactly 100? No, it could be 90, it could be 110. I hate the 90, but I like the 110. That's where the hypocrisy comes in, okay? So mean, median, mode. And sticking with heights, let's figure out how do you calculate that mean, okay? And I've given you examples with returns and so on because we are doing finance but I'm just getting a little excited here. So, the way you figure out y, mean. And you'll call it y bar. And theoretically it's also called expectation of y. Will be equal to this. What will you do? You'll take the values of y. All values of yi, all the values. And you'll multiply them by pi, which is what? The probability. So, you'll take each y, multiply it by its probability and sum. Over how many? All n possible. So, if i goes from one through n. n is the sample size, okay? So what is it saying? It's saying multiply the probability by the height. And if the probability of being five foot seven is one percent. That's how you get the first data point. And so on. I want to just emphasize this way of doing things. Because I think people forget that the usual way of saying it. And I'm going to write it up here, is this. Summation Yi divided by n. That's the usual way you'll see it done. Even in excel. So, when you do mean or average it's called in excel, you tell them what the ys are. They're already in the spreadsheet going from A1 to A100. If there are 100 observations, you just sum them and divide by n. You're making an assumption when you do that, and the assumption is, what it each pi? 1 over n. That means that the chance of each height entered in your Excel spreadsheet, and by the way, there's a note that tells you how to do that in Excel. It's so simply you just say Excel says do average. And when we have the time towards the end we may do that. But I'm not inclined to do that right now, I just want you to understand. It's very straight forward. Now the assumption built in on a normal average that you calculate, right. So what is the average rainfall this year? What will they do on a weather website? They will add up all the rainfall for each day and divide by 365. They're assuming that the likelihood of each thing is equal. And that's an important assumption. If you have a larger data set, it usually is an okay assumption, right? It doesn't matter what value 1 over n is that much. I want to emphasize this so that you understand. So, you calculated. Okay, what do I expect will happen? However, that is not the only story. I also have to worry about uncertainty or variance. In this case, worrying about the variance of height doesn't seem that traumatic. But let's just stick with it just as an example. Worrying about variance of returns is very traumatic right? Especially if they're going in the negative direction. So, this is what you have. So, what have I calculated? Let's assume, I've already calculated Y pi. Remember probabilities are here. P's. Now I look at this and I say, ask you the following. Okay. Are you sure? And suppose the average height in all the classes I've ever taught Is five foot, eight inches. And now, somebody asked me, but Gautam, are you sure that's the height? Answer's obviously I'm not sure, right? The only way I would be sure, is if this height was how much? Exactly one. Then you wouldn't have a distribution. Right? So, are you sure? And the answer is obviously not. Some people are here, some people are here. So what do you do? You do this, you take a yi, each yi and suppose that's this one, and you subtract y bar from it. Why? Because Y bar is the normal, the center of gravity of this behavior, the normal behavior. So you've got a deviation from it. In this case it's positive. In this case it's negative. All right? Now. You have to multiply that by the chance of this happening, right? This data point happening, okay? But you have to do another thing, you have to square it and then you sum across all possibilities and that's called variance. And the symbol used is sigma i square. Quick question, think about it for a second. Why do I not sum these? Why do I square them? And the reason is? I just gave you a hint. The mean is the center of gravity. So, what will happen? The positives and the negatives will cancel each other out. And what will you get every time? Zero. So, there's no point saying zero variance because zero variance is only true for what? Something you're 100% sure about everybody's 5'8" tall or I'm going to get my money for sure. So, the variance is the measure of uncertainty, however look at its units. The units of average are what? Inches. The unit of variance or uncertainty about your estimate or average high is square. So, what do we do? To make it the same unit, we do square root of sigma squared i, which we call sigma i, which we call standard deviation. By the way, one thing very important to note about normal distributions is just like the mean, is the average, is also the median, is also the mode. Similarly the only measure of uncertainty is standard deviation. If you do more strange distributions, you will get things like skewness. I don't want to get into those because that's not the purpose of this class. Higher levels, possibilities including skewness and I'll be doing finance. It depends on your assumption about the distribution. But for now let's take the standard deviation. Okay. I'm going to keep going and I'm going to first emphasize now why will we not stop here. Think about it, normal distribution, you know the expected value, you know the uncertainty. Why? We are done with it. We know the measure of risk, right. Because we know variance will be zero in the cash flows of which instrument, if you're holding a Treasury Bill? But if we are holding a corporate bond, what will the variance be? Positive. Right? So why worry about anything more, why not just simply stick with not knowing the world and characterizing expectation by mean or average and uncertainty by variance. Well, there's a reason for it and I'm just going to give you a flavor of the reason before I do the statistics. Because we are going to get into the details. Of this concept big time next week when we talk measurement of risk. Why variance of a security loses its portfolio? Tons are! Because we are risk lovers we are adverse to risk We hold portfolios. In fact, I don't know anybody in the world who has money to invest who doesn't hold portfolios. It would be silly to put all your eggs in one basket assuming you're risk averse. And human behavior is risk averse. If there's enough data to show it and I'll show you more as we go along, including today. Because we hold both portfolios, portfolios are a collection of things. They are not Single things. So, imagine a world in which each one of us was holding just one thing. Either Apple, Google, GE, and so on, and that was our behavior. That's not what the world is like, ten variances and means would be enough. Turns out, I know ahead of time, in fact we knew it in the cave. When she was ready to leave hunting outside for the first time, guess what the guy said? Hey, don't put all your eggs in one basket. That means diversify. Try to do different things so that you have different ways of collecting food so that you survive, right. So risk aversion implies holding portfolios. Portfolio means a collection of things not single things. And that means relations. We have to figure out relationships and how to measure them. Let me ask you this, simple example. I know I use very bizarre example, again not to do with finance. Suppose human beings could survive by themselves. Just by themselves. Each single person. Nothing to do with anybody else. Well, that's one word, but what happens? We believe that especially in business school we teach group work. So imagine if you will, the only person in group. You have only one thing. Now let me ask you, if you have a collection of things it's called team So think about it. I could look at your behavior alone if you were the only thing determining everything, right. You operate individually. But if you operate in groups. And let's take a group of two. What have I done? How many personalities? Two personalities. But what else have I introduced? Have introduced relationships, how many? Me and Ryan, Ryan and me are a team doing this. What is important now, not just his personality and my personality. What important is my relationship with him and his relationship with me. So, as soon as collective things in a portfolio. Our connections matter. We've got to be able to measure relationships and that's what after break we'll try to do using statistics. So, please take a break and we'll come back to How Do You Measure Relationships.