So, probability calculus is useful for understanding the rules that probability must follow and forms the foundation for all thinking on probability. However, we need something that's a bit easier to work with for numeric outcomes of experiments, where here I'm using the word experiment in a broad sense. So densities and mass functions for random variables are the best starting point for this, and this is all we will need. Probably the most famous example of a density is the so called bell curve. So in this class you'll actually learn what it means to say that the data follow a bell curve. You'll actually learn what the bell curve represents. Among other things, you'll know that when you talk about probabilities associated with the bell curve or the normal distribution, you're talking about population quantities. You're not talking about statement about what occurs in the data. How this is going to work and where we're going with this is that we're going to collect data that's going to be used to estimate properties of the population, okay. And that's where we'd like to head. But first before we start working with the data, we need to develop our intuition for how the population quantities work. A random variable is the numeric outcome of an experiment. So the random variables that we will study come in two kinds, discrete or continuous. Discrete are ones that you can count, like number of web hits, or the number, the different outcomes that a die can take. Or even things that aren't even numeric, like hair color. And we can assign the numeric value, one for blonde, two for brown, three for black, and so on. Continuous random variables can take any value in a continuum. The way we work with discrete random variables is, we're going to assign a probability to every value that they can take. The way that we're going to work with continuous random variables is we're going to assign probabilities to ranges that they can take. Let's just go over some simple examples of things that can be thought of as random variables. So the biggest ones for building up our intuition in this class will be the flip of a coin, and we'll either say heads or tails, or 0 or 1. This is discrete random variable, because it could only take two levels. The outcome of the roll of a die is another discrete random variable, because it could only take one of six possible values. These are kind of silly random variables because they're, the probability mechanics is so conceptually simple. Some more complex random variables are given below. For example, the amount of web, the website traffic on a given day, the number of web hits, would be a count random variable. We'll likely treat that as discrete. However we don't really have an upper bound on it. So it's an interesting kind of discrete random variable. We'll use the Poisson distribution likely to model that. The BMI of a subject four years after a baseline measurement is, a somewhat random example I came up with. And in this case we would likely model BMI body mass index as a continuous random variable. The hypertension status of a subject randomly drawn from a population could be another random variable. Here we might give a person a one if they have hypertension or were diagnosed with hypertension, and a zero otherwise. So, this random variable would likely be modeled as discrete, or would be modeled as discrete. The number of people who click on an ad. Again, this is another discrete random variable, but a, but an unbounded one. But still, we would still assign a problem, probability, for zero clicks, one clicks, two clicks, three clicks and so on. So intelligence quotients are often modeled as continuous. When we talked about discrete random variables, we said that the way we're going to work with probability for them is to assign a probability to every value that they can take. So why don't we just call that a function? We'll call it the probability mass function. And this is simply the function that takes any value that a discrete random variable can take, and assigns the probability that it takes that specific value. So a PMF for a die roll would assign one sixth for the value one, one-sixth for the value two, one-sixth for the value three, and so on. But you can come up with rules that a PMF must satisfy in order to then satisfy the basic rules of probability that we outlined at the beginning of the class. First, it must always be larger than zero because we, larger than or equal to zero, because we've already seen that a probability has to be a number between zero and one, inclusive. But then also the sum of the possible values that the random variable can take has to add up to one, just like if I add up the probability a die takes the value one, plus the probability that it takes the value two, plus the probability it takes the value three, plus the value it takes four, five, six, that has to add up to one, otherwise. The probability of something happening, right, that the die takes one of the possible values, would not add up to one, which would violate one of our basic tenets of probability. So all a PMF does, has to satisfy, is these two rules. We won't worry too much about these rules. Instead, we will work with probability mass functions that are particularly useful, like the binomial one, the canonical one for flipping a coin, and the Poisson one, the canonical one for modelling counts. Let's go over perhaps the most famous example of a probability mass function, the result of a coin flip, the so-called Bernoulli distribution. So let's let capital X be the result a coin flip, where X equals 0 represents talks and X equal 1 represents heads. Here we're using the notation where an upper case letter represents a potentially unrealized value of the random variable. So it makes sense to talk about the probability that x equals 0 and the probability that capital X equals 1. Where as a lower case x is just a placeholder that we're going to plug a specific number into. So in the PMF down here, we have p x equals one half to the x, one half to the one minus x. So if we plug in x equal to 0 we get one half, and if we plug in x equal to 1 we get one half. And this merely says is that the probability that the random variable takes the value 0 is one half and the probability that the random variable takes the value 1 is also one-half. This is for a fair coin. But what if we add an unfair coin? Let's let theta be the probability of a head and 1 minus theta be the probability of a tail, where theta's some number between 0 and 1. Then we could write our probability math function like this. P of x is theta to the x, 1 minus theta to the 1 minus x. Now, notice if I plug in a lower case x of 1 we get theta, and if I plog in a, plug in a lower case x of 0 I get 1 minus theta. Thus for this population distribution the probability a random variable takes the random 0 is 1 minus theta. And the probability that it takes the value of 1 is theta. This is incredibly useful, for example, for modeling the prevalence of something. For example, if we wanted to model the prevalence of hypertension, we might assume that the population or the sample that we're getting is not unlike flips of biased coin with the success probability theta. And just to connect it to what we'll be doing in the future. The issue is that we don't know theta. So, we're going to use our data to estimate this population proportion.