0:14

So, at this point you've done a lot of exploratory data analysis, and

Â you probably have a reasonable sketch of the solution that you're looking for for

Â your question.

Â But you may not be so sure about whether that solution will hold up to kind

Â of challenges or will be sensitive to the little changes in the data or the model.

Â And so, formal modeling is the process of kind of specifying your

Â model very precisely so that you know what you're trying to estimate, and

Â you know how you can challenge your findings in a rigorous framework.

Â And so, usually the way that we write down these models

Â is with mathematical notation or with computer codes so

Â that we can get very specific about what we're trying to do.

Â Using just words alone to describe a model,

Â so that's why we usually have to resort to mathematics or computer code.

Â Now, when we talk about formal models, very often we'll talk about parameters,

Â and parameters play an important role in statistical modelling.

Â Parameters usually represent characteristics of the population.

Â They might describe relationships between variables or certain features

Â like means or standard deviations, and because they are characteristics of

Â the population they're generally considered to be unknown.

Â So, we have to estimate them from the data and

Â often the goal of an analysis Is to estimate given types of parameters.

Â And the nice thing about formal modeling is that it allows you to

Â specify which types of parameters you're interested in and are trying to estimate.

Â So for example, in a linear regression,

Â the coefficients of the linear regression models are parameters.

Â And very often your goal is to estimate them.

Â So, the general framework for doing formal modeling is very similar to kind of what

Â we've been talking about all along in this course.

Â The three basic steps are setting expectations, collecting information,

Â and then revising those expectations based on what you see.

Â So for setting expectations, typically you want to have a primary model,

Â which is your best sense of what the solution should be and

Â what is the answer to your question.

Â This is the leading candidate based on all the current information you have and

Â based on any kind of exploratory analysis that you've done.

Â Now it's not necessarily going to be the final model, it's

Â going to be the model that you start with to kind of structure your formal analysis.

Â You may update it later based on any information that you collect,

Â and you may change your primary model to be something different.

Â So that's okay, and so you don't have to worry too much about kind of

Â what your primary model is going to be from the get-go.

Â Just pick one that seems reasonable from the data, and provides a, kind of,

Â sensible solution.

Â And then you'll be testing it in various different ways,

Â just to see if it's going to work.

Â 2:45

So the next stage is collecting information.

Â So given that you have a primary model, you want to develop a series of what I

Â call secondary models to test and challenge your solution.

Â So the basic idea of a secondary model is that it's

Â slightly different from your primary model.

Â It may add variables, it may subtract variables and

Â you may add different functional forms.

Â So it looks kind of like your primary model but it has variations to it.

Â Those are all the secondary models that you want to look at.

Â And the goal is to generate evidence against your primary model.

Â Okay, so if you can generate evidence that suggest that your primary model is

Â incorrect.

Â Then you can kinda update your thinking,

Â and try to come up with a different primary model.

Â Another word for this is sensitivity analysis.

Â So you wanna think about how sensitive your primary model is to

Â various changes that are introduced via the secondary models.

Â So we'll talk a little bit more specifically about what this means

Â when we talk about our examples okay?

Â The last step of course is revising your expectations.

Â And so if the secondary models are largely consistent with your primary model and

Â exactly what consistent means depends on the context and your application and

Â the question that you are asking.

Â And then that's great and you can either move on to the next phase or

Â maybe you're finished and you can just kind of record your results.

Â However, if your secondary models successfully challenge your primary model

Â and maybe put some of your initial conclusions in doubt,

Â then you may need to adjust or modify the primary model to better reflect

Â all this additional evidence that you've generated via all the secondary models.

Â And then you can start the process again with a new primary model, and

Â then perhaps a new set of secondary models to challenge that primary model.

Â And then you can iterate through this process until you arrive at a model that

Â you think reasonably summarizes your data and

Â answers the question that you were looking to ask.

Â 4:36

So there's two basic types of situations In which we often use formal modeling.

Â And I'm going to categorize them into what I call Associational Analyses and

Â Prediction Analyses.

Â So Associational Analyses, there the aim

Â is to look at the association between two or maybe more features while

Â in the presence of many other potentially confounding factors, okay And so for

Â the most part we're interested in looking at the association between two things, but

Â there may be other things that we have to adjust for, or account for, and

Â that we want to make sure we're accounting for properly.

Â So there's three classes of variables in an associational type of analysis.

Â The first is the outcome.

Â The outcome is the factor or

Â the feature that we think varies along with what I call a key predictor.

Â Now, it may not actually respond to the key predictor in a causal sense,

Â but the idea is that when the key predictor changes,

Â the outcome changes along with it, whether it is causal or not.

Â So the next type class of variable is what I call the key predictor.

Â And this is usually the predictor of interest that we wanna know,

Â how does it vary?

Â How does the outcome vary with the key predictor?

Â There may be one, or two, or

Â even three key predictors that you're primarily interested in.

Â But usually the number will be small, and

Â very often the number is actually just one.

Â And so the key predictor explains some variation in the outcome and

Â it's something that you're interested in.

Â The last set is a very large class of variables,

Â is all the potential confounders.

Â So these are things that tend to be, they are associated with your key predictor and

Â they're also associated with their outcome.

Â And they generally serve to kinda confuse the association between your key predictor

Â and your outcome.

Â So these are things that often may need to be accounted for in some way or

Â included in a formal model to properly

Â examine the relationship between your key predictor and your outcome.

Â The next set of analyses that you might do is a prediction analysis.

Â So a prediction analysis differs from an associational type of analysis,

Â because the goal here is to really to use all available information to predict

Â an outcome, okay?

Â So you usually don't care about the mechanism, or

Â how things work when you're trying to do a prediction analysis.

Â So you don't necessarily care about explaining how a variable can predict

Â a given outcome.

Â You just want to be able to generate a good prediction from a set of variables

Â and you're not developing some detailed understanding of the relationships

Â between all the features okay.

Â Now for most prediction analysis this really isn't a distinction between

Â the key predictor and a bunch of other potential confounders.

Â Usually all predictors are considered equally cuz

Â they may contribute information to predicting the outcome.

Â And before we do the analysis, we may not weight them any differently,

Â we may weight them all equally.

Â Because we don't have any a priori information about might more or

Â less important.

Â Now any good prediction algorithm, will help you to determine which variables

Â are useful for predicting the outcome and which aren't.

Â And so they will help you sort that out.

Â 7:38

Another feature of prediction analyses is that, usually the model that you use

Â cannot be written down in a convenient mathematical notation.

Â Often, the only way that the procedure can be specified, is through computer code, or

Â through some algorithm.

Â And so, often there's no parameters of interest.

Â So, you're not trying to estimate any parameters.

Â You're just trying to make a prediction using any combination of features,

Â using an functional form of any type of model.

Â So that's kind of hallmark of prediction analysis.

Â And finally, most prediction analysis often are what you might call

Â classification problems, so the outcome is really something that takes two

Â different values and you're trying to predict one of those two different values.

Â So that's the kind of summary of what formal modeling is used for and

Â in what context is may be used.

Â Whether it's associational or prediction types of analyses.

Â In the next lectures I'll give examples of associational and prediction analysis and

Â how the formal modeling framework can be used to work through these different

Â kinds of analyses.

Â