[MUSIC] What are the objectives of statistical models? What can they accomplish and where do they fit in the broader field of data science? Statistics is concerned with the entire process of planning, collecting and analyzing data to draw scientifically defensible conclusions. Because these objectives are important to a wide variety of disciplines, you might encounter statistics in areas ranging from business to medicine, biology to sports, social sciences to pharmacology. Or agriculture to ecology. This course is about statistical modelling which falls under the analyzing data objective. So what is a statistical model? In this course, we're going to define a statistical model. A statistical model will be a mathematical structure used to imitate, And approximate, The data generating process. It typically describes relationships among variables while accounting for uncertainty and variability in the data. For example, let's consider an experiment where we wish to reveal a cause and effect relationship. If the relationships between variables are complicated or involve variables that we can't measure, the data we collect may appear to have random behavior. We can account for this apparent randomness or uncertainty or variability using the theory of probability. Then, hopefully a statistical model will help us identify structural, such as cause and effect, relationships hidden in the noisy data. As an example of another source of uncertainty and variability in your data, let's consider a poll conducted on a sample of likely voters. Let's say that 57% of those polled are in favor of one particular candidate. When presented with these results, it might be natural to ask, how confident am I in this 57% number? Does it mean that 57% of all people favor that candidate? Well that depends on several other questions related to how well the poll was conducted for example. How were the participants selected? How many people were polled? If we repeated the poll on another randomly selected sample of voters, how might that 57% number change? Using probability theory, we can account for the uncertainty surrounding this estimate. For what kinds of problems might we use a statistical model? Here are four common objectives. The first one we're going to talk about is to quantify uncertainty. In the polling example, we could come up with a range of reasonable estimates to put that 57% number in context. Suppose a statistician analyzes the poll data and concludes that with 99% confidence, the true percentage of all citizens who favor the candidate is between 51 and 63%. That's a little more informative. It gives us an idea of how much we can trust this estimate. The second objective we'll talk about is inference. In the polling example, we only know a percentage of people in the sample who support the candidate. A statistical model helps us extend the result and infer what percentage of the total population supports the candidate? We may also have other demographic information about each person in the poll. A statistical model might allow us to see how these other variables relate to a person's likelihood of supporting the candidate. The next objective is measuring the evidence in the data in support or against a hypothesis. In the polling example, suppose an expert hypothesizes that the candidate is more popular with men than with women. Luckily we have further breakdown of the poll results. Let's suppose that 55% of women favor the candidate and 59% of men favor the candidate. Is this strong enough evidence to support or validate the experts claim? A statistical model could account for the uncertainty in these estimates to help us answer that question. The fourth and final modeling objective that we're going to talk about is prediction. In the polling example, suppose we have demographic information about a particular voter. But we don't know which candidate she supports. We might employ a statistical model to use what is known about about her to make an accurate prediction of her voting selection. It is clear that statistical modelling has much to contribute to data science. Another key player in this arena is machine learning. How does that relate to statistical modeling? Broadly, machine learning uses algorithms, some of which are based on or can be formulated as statistical models, to produce state of the art prediction. That is, machine learning specializes in and excels at this fourth objective. Often, excellent predictions come at the cost of building complicated models that are difficult to interpret. For that reason, such algorithms are frequently referred to as black boxes. Of course, who would believe a model that did not produce realistic predictions? Objective four is almost always important but doesn't necessarily answer all of the questions an investigator might have about their data. Statistical models strive to balance the objectives above to answer the questions at hand. [MUSIC]