>> We're now going to review some of the basic concepts from probability. We'll discuss expectations and variances, we'll discuss Bayes' theorem, and we'll also review some of the commonly used distributions from probability theory. These include the binomial and Poisson distributions as well as the normal and log normal distributions. First of all, I just want to remind all of us what's a cumulative distribution function is. A CDF, a cumulative distribution function is f of x, we're going to use f of x to denote the CDF and we define f of x to be equal to a probability that a random variable x is less than or equal to little x. Okay. We also, for discrete random variables, have what's called a probability mass function. Okay. And a probability mass function, which we'll denote with little p, it satisfies the following properties. P is greater than or equal to 0, and for all events, A, we have that the probability that x is in A, okay, is equal to the sum of p of x over all those outcomes x that are in the event A. Okay. The expected value of a discrete random variable, x, is then given to us by this over here. So, it's the sum of the possible values of the random variable x. These are the xi's, weighted by their probabilities, p of xi. So, that's the expected value of x. If I was to give you an example. Suppose, for example I tosses a dice. So, it takes on 6 possible values. Okay, 1, 2, 3, 4, 5, and 6. Okay. And it takes on each of these values with probability, so that's wp, with probability 1 6th, with probability 1 6th and all the way down 1 6th. So, in this case, for example, the probability that x is greater than or equal to 4 is equal to, well, it's 1 6th for 4, 1 6th for 5, and 1 6th for 6, so that's equal to 1 6th plus 1 6th plus 1 6th equals 1 half. Likewise, we can compute the expected value of x. In this case, it is equal to 1 6th times 1 plus 1 6th times 2, and so on, plus 1 6th times 6. And that comes out to be 3 and a half. Okay. So, we also have the variance of a random variable. It's defined as the expected value of x minus the expected value of x, all to be squared. And if you expand this quantity out, you can see that you'll also get this alternative representation, so that the variance of x is also equal to the expected value of x squared minus the expected value of x, all to be squared. Okay. So, there, a discrete round of variables, probability mass function, and so on. So, let's look at a couple of distributions. The first distribution I want to talk about is the binomial distribution. We say that a random variable x has a binomial distribution, and we write it as x tilde binomial, or bin n, p, if the probability that x is equal to r, is equal to n choose r times p to the r by 1 minus p to the n minus r. And for those of you who have forgotten, n choose r is equal to n factorial divided by r factorial times n minus r factorial. So, the binomial distribution arises, for example, in the following situation. Suppose we toss a coin n times, and we count the number of heads. Well then, the total number of heads has a binomial distribution and we're assuming here that these are independent coin tosses so that the result of one coin toss has no impact or influence on the, the outcome of other coin tosses. The mean and variance of the binomial distribution are given to you by these quantities here. So, the expected value of x equals np, the variance of x equals np times 1 minus p. Now, there's actually an interesting application of the binomial distribution to finance. And it actually arises in the context of analyzing fund manager performance. We'll actually return to this example later in the course. But let me just give you a little flavor of it now. So, suppose, for example, a fund manager outperforms the market in any given year, with probability p. And that she underperforms the market at probability 1 minus p. So, we're assuming here that the fund manager either outperforms or underperforms the market, only two possible outcomes. And that they occur with probabilities p and 1 minus p respectively. Suppose this fund manager has a track record of ten years, and that she has outperformed the market in eight of these ten years. Moreover, let's assume that the performance, the fund manager performance in any one year is independent of the performance in other years. So, a question that many of us would like to ask is the following. How likely is a track record as good as this outperforming eight years out of ten, if the fund manager had no skill? And, of course, if the fund manager had no skill, we could assume maybe that p is equal 1 half. Okay. So, actually, we can answer this question using the binomial model, or the binomial distribution. So, let x be the number of outperforming years. Since the fund manager has no skill, then there are ten years, and the total number of outperforming years x, is then binomial, with n equals 10, 10 years, and p equals a half, okay? So, we can then compute the probability that the phone manager does at least as well as outperforming in eight years out of ten, by calculating the probability that X is greater than or equal to 8. So, what we're doing here is calculating the probability that the fund manager would have 8, 9, or 10 years out of 10 in which she outperformed the market. And that is given to us by the sum of these binomial probabilities here. So, these were the original binomial probabilities on each slide, and we summed them from r equals 8, to n. And n, in this case, of course, is 10, okay? So, that's one way to try and evaluate whether the fund manager has just been lucky or not. One can compute this probability and if it's very small, then you might conclude that the fund manager was not lucky and that she had some skill. But actually, this opens up a whole can of worms. There are a lot of other related questions that are very interesting. Suppose there are M fund managers, how well should the best one do over the ten-year period if none of them had any skill? So, in this case, you don't have just one fund manager as we had in this example so far, we now have M of them, okay? And it stands to reason that even if none of them had any skill, then as M gets large, you would expect at least one of them or even a few of them to do very well. Well, how can you analyze that? Again, you can use the binomial model and what are called order statistics of the binomial model to do this. And we'll actually return to this question later in the course. Okay. So, let's talk about another distribution that often arises in finance and financial engineering, that is the Poisson distribution. We say, that x has a Poisson lambda distribution so lambda is the parameter of the distribution. If the probability that x equals r is equal to the lambda to the power of r times e to the minus lambda, divided by r factorial. And for those who have forgotten factorials, I also used it in the binomial model a while ago. R factorial is equal to r times r minus 1 times r minus 2, all the way down to 2 times 1. Okay. So, this is the Poisson distribution. The expected value and the variance of a Poisson random variable are identical and equal to lambda. So, for example, we'll actually just show this result here. It's very simple and the mean is calculated as follows. We know that the expected value of x is equal to the sum of the possible values of x, so these are the r's, times the probability that x is equal to r and r runs from 0 to infinity. We can calculate that as follows. So, we have the summation of r and the probability that X equals r. We know from up here, okay, and we can substitute that down in here and now, we just evaluate the sum. The first thing to notice is that when r equals 0, this term in the sum is equal to 0. So, we can actually ignore the 0, the first element, the 0 element and replace the summation running from r equals 1. So then, we get this quantity here. We can cancel this r out with the first r up here and write, this is r minus 1 factorial. We can also pull one of these lambdas out here leaving us with a lambda to the r minus 1. And now, if we look at this quantity here, this summation here, we see that this is the same as changing this to run from r equals 1 to r equals 0 and replacing r minus 1 with r and r minus 1 factorial with r factorial here. This total we see is equal to the sum of the probabilities. These are the probability that x equals r, so this is the sum of the probabilities that x equals 0, x equals 1, x equals 2, so this is equal to 1. The total sum of probabilities must be equal to 1, so this is equal to lambda. Okay, let's talk a little bit now about Bayes' theorem. Let A and B be two events for which the probability of B is nonzero, then the probability if A given B, and this is notation we'll use throughout the course, this vertical line means it's a conditional probability. S,o it's the probability of A given that B has occurred, well, this is equal to the probability of A intersection B divided by the probability of B. Alternatively, we can actually write this, this numerator probably of A intersection B, as being the probability of B given A by the probability of A. So, this is another way to write a Bayes' theorem. And finally, if we like, we can actually expand the denominator here, the probability of B, and write it as the summation of the probability of B given Aj, by the probability of Aj. Let me sum over all Aj's. For the Aj's, form a partition of the sample-space. What do I mean by partition? Well, I mean the following. So, Ai intersection Aj is equal to the null set, for i not equal to j, and at least 1 Ai, at least, at least one Ai must occur. And, in fact, because Ai intersection Aj is equal to the null set, for i not equal to j, I can actually replace this condition with the following, exactly one Ai must occur. Okay. So, that's Bayes' theorem. Let's look at an example. So, here's an example where we're going to toss 2 fair 6-sided dice. So, Y1 is going to be the outcome of the first toss, and Y2 would be the outcome of the second toss. X is equal to the sum of the two, and that's what we plotted in the table here. So, for example, the 9 here comes from the 5 on the first toss and 4 on the second toss. So, 4 plus 5 equals 9. So, that's X equals Y1 plus Y2. So, the question we're interested in answering is the following. What is the probability of Y1 being greater than or equal to 4, given that x is greater than or equal to 8? Well, we can answer this using this guy here on the previous slide. So, this is equal to the probability that Y1 is greater than or equal to 4 and X is greater than or equal to 8, divided by the probability that X is greater than or equal to 8. Okay. So, how do we calculate these two quantities? Let's look at the numerator first of all. So, we need two events here. Y1 must be greater than or equal to 4 and X being greater than or equal to 8. Okay. So, the first event is clearly captured inside this box here, okay, because this corresponds to Y1 being greater than or equal to 4. So, all of these outcomes correspond to that event. The event that X is greater than or equal to 8 corresponds to this event or these outcomes. So therefore, the intersection of these two outcomes, where Y1 is greater than or equal to 4 and X is greater than or equal to 8, is this area here, which is very light, so let me do it a little bit darker. So, it's this area here. Now, each of these cells is equally probable and occurs at probability 1 over 36. There are a total of 3, 4, 7, plus 5, 12. So that's 12 cells here. So, the numerator occurs with probability 12 over 36. And the, the denominator, the probability that X is greater than or equal to 8, well, that's what we highlighted in the red here. And the probability of that occurring, well, there's 12 plus these 3 additional outcomes equals 15 outcomes. So, that's 15 over 36, and that is equal to 4 over 5. So, that's our application of, of Bayes' theorem. Okay. So, let me talk a little about continuous random variables. We say a continuous random variable x has a probability density function, or a PDF, f. If f of x is greater or equal to 0, and for all events, A, the probability that x is in A, or the probability that A has occurred is the integral of the density, f of y, dy over A. The CDF, cumulative distribution function, and the PDF are related as follows, f of x is equal to the integral from minus infinity to little x of f of y dy. And, of course, that's because we know that f of x, by definition, is equal to the probability that X is less than or equal to x, so this, of course, is equal to the probability that minus infinity is less than or equal to X, is less than or equal to little x. So, this is our event A here and this definition here. So, A is now integrated from, A is now the event minus infinity less than or equal to the random variable x, less than or equal to little x, so that's what we have over here. So, it's often convenient to recognize the following, that the probability that x is in this little integral here, x minus epsilon of 2 and x plus epsilon over 2. Well, that's equal to this integral, x minus epsilon over 2 to x plus epsilon over 2 times f of y dy, okay? And if you like, we can draw, something like this. So, this could be the density, f of x. This is x here, maybe we've got some point here which is little x, and this is x minus epsilon over 2. This is x plus epsilon over 2. So, in fact, what we're saying is that the probability is this shaded area, and it's roughly equal to this value, which is f of x times epsilon, which is the width of this interval here, okay? And, of course, the approximation clearly works much better as epsilon gets very small. Okay. So, there are continuous random variables. Let me talk briefly about the normal distribution. We say that X has a normal distribution or write X tilde N mu sigma squared if it has this density function here. So, f of x equals 1 over root 2 pi sigma squared times the exponential of minus x minus mu, all to be squared divided by 2 sigma squared. The mean and variance are given to us by mu and sigma squared respectively. So, the normal distributions are very important distribution in practice, its mean is at mu, its mode, the highest point in the density is also at mu and approximately 95% of the probability actually lies within plus or minus 2 standard deviations of the mean. So, this is approximately equal to 95% for a normal distribution. Okay. So, this is a very famous distribution. It arises an awful lot in finance. It certainly has its weaknesses and we'll discuss some of them as well later in the course. A related distribution is the log-normal distribution. And we will write that x has got a log-normal distribution with parameters mu and sigma squared if the log of x, is normally distributed with mean mu and variance sigma squared. The mean and variance of the log-normal distribution as given to us by these two quantities here, and again, the log-normal distribution plays a very important role in financial applications.