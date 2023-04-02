When we are discussing random variables, we're going to talk about two specific types of random variables, the first of which we call as discrete random variables. Discrete random variables are random variables that only take a discrete set of values. Maybe 0, 1, 2, 3 or 1, 10, 100, or -5, -7, 9. For example, the random variable which defines the number of movies watched by a person over a period of time is necessarily a discrete random variable. On the other hand, the number of hours spent by a user on watching movies is not a discrete random variable. While the number of movies watched by a user is necessarily an integer like 1, 2 or 0 or 3, the time spent in watching movies is not necessarily an integer. Somebody could have watched the movie for one and a half hours and then left it off. The time spent is not a discrete random variable whereas the number of movies is going to be a discrete random variable. When we have a random variable as a discrete random variable, besides the cumulative distribution function which is always a well-defined entity, we have something else which can be defined and could prove to be very useful. It is nothing but the probability mass function. What is the probability mass function? F_x{x_i} is defined as probability, the random variable X takes the value x_i. It directly gives you the probability value for each of the discrete set of values that this random variable can take. Now what properties should the probability mass function have? Well, to begin with, at the end of the day, this is going to give us a probability value. Which means the probability mass function should always take a value between zero and one. Well, is that all? Should probability mass function be a non-decreasing function like the cumulative distribution function? No, there is no reason the probability mass function should be non-decreasing or non-increasing. You can have any value without any regard to being increasing or decreasing. But we have another new condition, the sum of the probability mass function values over all the discrete set of values that are discrete random variable X should be one. Which means the sum of probability mass function values should always add to one. This is reflection of one of the axioms of probability, that probability of the sample set or the sample space should be equal to 1. Let us look at an example of a probability mass function on an Excel. Suppose I am looking at the data on number of children, various couples across the city of Ahmedabad. The data that you are looking at here could be a probability mass function. Here we are saying that with a probability 0.22 a couple could have no children at all, with a probability 0.3333 so on, a couple could have one child, with probability 0.25 a couple could have two children, and so on. Let us try to compute what will be the corresponding cumulative distribution function. One of the first checks that has to be done is to ensure that the real number values 0-6 are structured in an ascending order. This is X, the random variable's value, and this is the probability associated with the value. Or this gives the probability mass function. Now let us talk about the cumulative distribution function. Well, we know one cannot have less than zero children, so, in this case the cumulative distribution function is just 0.22. But what is the probability that a couple has one or fewer children, which is zero or one children? It can be computed as probability that a person has zero or fewer plus the probability a couple has exactly one child. What about the cumulative distribution function value at 2? It is 0.25+0.553333, and so on. When we drag down, we get the cumulative distribution function. Now, even though we calculated the cumulative distribution function in this manner, we can check that this cumulative distribution function satisfies all the properties that it is meant to satisfy. For values less than 0, the CDF value is equal 0. Here is 0.22 you can see it's increasing, which is clearly non-decreasing, and for value 6 and above, the cumulative distribution function is equal to 1. On the other hand, you can also find the probability mass function if you know the cumulative distribution function. All that you have to do is only to subtract consecutive values and you get back the probability mass function. Because probability mass function just measures the marginal of the additional probability that gets added when you're looking at cumulative distribution function. Next, we are going to talk about a different type of random variable called as the continuous random variable. A continuous random variable takes a continuous set of values. As said before, if I ask you, how much time will a person spend watching movies in your streaming service? This is not likely to be a number like one or two hours, it could be a number in between 1.5 hours, it could be a number 1.25, so, it can take values across a continuous range of numbers, and those are called continuous random variables. In the context of continuous random variables, while the cumulative distribution function is still well-defined, the probability mass function is not necessarily defined. For example, it doesn't make sense to ask, what is the probability a person is going to watch a movie for exactly 1.2379 hours? However, if I talk about intervals of numbers, it is much easier to answer. For example I can ask, what's the probability that a user is going to spend somewhere between 3.5-4.5 hours watching movies? This is a well-defined question that can be answered. In direct terms, the answer to this question can be provided just by looking at the cumulative distribution function. One has to just subtract the cumulative distribution function values at 4.5 and 3.5 and that is going to give you the probability a person is going to watch the movie between 3.5-4.5 hours. However, there is another useful entity which is used to represent continuous random variable. It is called the probability density function. The probability density function is nothing but the derivative of the cumulative distribution function. Which means if we differentiate the cumulative distribution function with respect to x, the resulting value is called as the probability density function. Note, unlike the cumulative distribution function, and unlike the probability mass function, the probability density function is not a probability. Which means the probability density function can take values even greater than one and that is okay. In fact, we can look at examples where probability density function takes values greater than one soon.