Hey everyone. This lecture is about understanding the hypothesis. We will be making sure that everyone is on the same page about certain aspects of studies in study design, to help us when we discuss various studies we use as examples in the course. Now that we understand features of experimental design from the last few lectures, we're going to talk about how to translate those features into models.Here are the learning objectives. We will look at different models we'll see in the course, and we'll talk about how they influence outcomes and testing hypotheses. We're also going to be mapping different features of studies to one another, so you have a design that makes sense and can be effective. By leaking goals to hypotheses, hypotheses to models, and models to design. Finally, we will talk about different hypotheses based on within in-between factors and we'll go over interactions. Models are essentially tools we use as researchers to help explain to us the relationships between variables. They can take values of independent variables we're studying and tell us what we can expect dependent variables to end up being based on those values, helping summarize associations between predictors and outcomes. If you want to do good power and sample size calculations, you have to have a basic understanding of statistical models. As we just talked about, we use models to predict outcomes. This then leads us to generating hypotheses that we want to test. The way it works is that when these models give us predicted values of outcome variables based for their different values of predictor values, summarizing the relationship between the two or more than two depending on the model. After the model summarizes the relationship of the data, you can also use it to work backwards and calculate a predicted value of a response variable for each value of the predictor. Once this relationship is summarized, we can use hypothesis tests to explore if real values we find are different than the predicted ones that we expect from the models. Here we see a visual representation of what we're talking about. Suppose I've got only one predictor variable and independent variable, and only one outcome variable also known as the dependent or response variable. Let's say the predictor variable is hour spent studying per week and the outcome variable is GPA. That is considered a positive relationship where on average, the more hours spent studying each week, the higher the GPA will be. If you look at the visual on the right, all those little blue circles are data points. The red line you see is the prediction, which is the best fit line. The line that most certainly goes through all and represents all of the previously observed data points. This line of prediction is also the line that represents a model. If you want to check what outcome variable value a certain predictor value will have in a given model, you can use this prediction line. Hypothesis tests often look at the slope of the line of perdition, specifically asking whether the slope is equal to zero or not. The slope is the degree of association between the predictor and the response. If using our example, our study was not associated with GPA, I would expect that line to be flat with a slope of zero, with the data flying in random cloud around it. However, we know our study it is related to GPA, so we're not surprised by the way the data moves in the positive direction that you see. You can tell it's simulated, indeed it did though. Due to the very tight clusters around the regression line, this is something that we never see in real data. The standardized slope of the line of prediction essentially measures how strongly the predictor is associated with the outcome. The larger the slope, the more strongly related they are. The tighter the cluster, the closer the correlation is to one. Our goal is to find a model that is strong enough to represent reality. They will never perfectly fit reality, but models do their best to approximate reality using the information available. Box and Draper said it best when they said all models are wrong but some are useful. We tried to fit the best model to make predictions about what's going to happen in real life. Remember, we use models to perform hypothesis testing which is the answer to our questions in real life. We want you to account for and be aware of the different terminology that is used for study designs and statistical models. It's useful to understand this diverse terminology because it can assist in interdisciplinary collaboration. On top of that, being familiar with statistical model terminology will prepare you to use GLIMMPSE, the software we will be using for power and sample size analysis calculations in this course. Let's take a look at this terminology. You can see study design terminology on the left, and statistical model terminology on the right. Going across the rows of the different tables, the role of each element is exactly the same. Scientists in different contexts use different terms for them. You may want to pause the lecture on this slide and go through each row on this table. Make sure you have an understanding of each study element and be aware of the different terminology. As we talked about earlier, GLIMMPSE, our power and sample size analysis software uses statistical model terminology. This table gives you a synopsis of this terminology and examples of what GLIMMPSE could be talking about when you're asking for the study elements using an educational study context. We mentioned predictors before. They are usually our independent variables that affect the outcome variables. They can be interventional, like treatment assignment, observational like gender or something else that cannot be assigned, and they can be fixed or random. Fixed predictors are predictors that are chosen by the design. The classic example of this treatment is treatment assignment. Typically, the goal is to understand the relationships between fixed predictors and response variables. So, we'd want to know whether to see the treatment affects the outcome variable we are looking at. Fixed predictors are defined by either sample selection or randomization. When we say, define by sample selection in example would be having sex as a predictor. So, in order to have a good study design, we may recruit 100 male participants and 100 female participants. Treatment assignment is an example of fixed predictors that are defined by randomization. Random predictors are independent variables that have values that can vary by chance, but are still good for experimental interests. The examples we see here are blood pressure and alcohol intake. These are factors where the values are not fixed ahead of time. So, suppose I use alcohol intake as a predictor of liver disease. This is considered a random predictor, but I don't know what the predictor values I'm going to get before I run the experiment. There are also factors called covariates. Random covariates impact the outcome variable in some way, but they are not one of the key variables we care about exploring in the study. Once again, we do not know the value of this before we run our experiment. An example of a covariate is measuring something at baseline. We don't know what the value will be, we don't really care about what the value is at the baseline, but it can help isolate the true effect of the predictor we're actually cared about on the outcome by reducing error of variance. We talked about repeated measures. So, should we sense that a repeated measures model includes the outcome variable being measured multiple times. We're going to talk about a few types of repeated measures. We've talked about a longitudinal repeated measures. As a reminder, these are within factors that are measured over time. For example, students test scores measured at end of school year over four years would be a longitudinal repeated measure. Spatial repeated measures are measures of some kind of distance, such as every kilometer of a river. Then, we have a multivariate repeated measurements. This is when we are measuring multiple outcomes or multiple times. An example of this would be where we are measuring distance, walk, heart rate, and blood pleasure at the end of the day for a week. These are all strongly correlated to each other, but they are in fact different factors. It is worth noting that multivariate tells us there are multiple outcomes, not that there are multiple predictors. Let's talk about models now. There are three models we will look at. The univariate linear model, the multivariate linear model, and the mixed linear model. All of these models have a common structure. First of all, a model is simple statement about the population in general. We generalize from the sample we observe to a population we hope it represents. The structure looks like this. The response is equal to some prediction plus an error. We use data from the sample to find estimates of parts of the model. We observe the response values or outcomes. The univariate model only has one single outcome measured at one time. These models only work with uncorrelated data, which means no clustering is involved and there's only a single outcome measure. As we just talked about when referring to repeated measures, the multivariate model is to explore multiple outcomes. This can include repeated measures at different points in time. Multiple outcomes measured at just one point in time, or multiple outcomes measured repeatedly at different points in time. This can get complicated. The mixed linear model is used for experiments with outcomes like the multivariate model, but it is a more flexible model. When we say more flexible, this is because it makes fewer assumptions. So, it can be applied to more situations. The major difference here is that the mixed model can more easily accommodate missing data. A topic that we will discuss later in this course, then the multivariate model. Unlike the univariate linear model, the multivariate mixed linear models are used for correlated data. Once again, correlation can be a result of clustering or repeated measures. So, multivariate mixed models are used for all kinds of combinations of these things, like one level with multiple outcomes, multiple levels with one outcome, or multiple levels with multiple outcomes. Now, we're going to look at a couple of hypotheses. These differ the same way the factors with the same name differ. You'll see what I mean in a minute. We've seen independent sampling unit hypotheses explore changes over time or space. The example here is measuring the pain level of every patient for six weeks following a procedure. We have a single group, and we want to see how pain changes over time. You can see in the graph that the average pain level for the group appears to decrease over time. Between hypotheses explore differences between groups. Keeping with pain example, we might be interested in knowing whether sensory focused group had a different average level of pain than standard of care group. In this graph, you can see the standard of care group appears to have reported more pain than the sensory group. Most commonly however, there will be a combination and within by between hypotheses. This kind of hypotheses explores how an outcome changes over repeated measures like time between groups. In this graph, you are able to see the sensory group has a lower average pain level than the standard of care group. You can also see it decreases over time. As you can see, we are able to look at patterns over time across treatment groups. Let's do a review summary of what we learned in this lecture. We aim to use models that represent reality as closely as possible. They take inputs or independent variables we are interested in, and based on those values, those inputs say predict what values of outputs will be, the dependent or outcome variable of interest in the study. In this course, we talked about the univariate linear model, the multivariate linear model, and the mixed linear model. We use the predicted values from one model to inform the hypotheses tests we explore in other studies. We can explore between hypotheses, like there will be no difference between the outcomes of participants in Treatment A and Treatment B within hypotheses like, there will be no difference in daily miles walked over a three-week period. We can explore interactions or within by between hypotheses like, there will be no difference in the pattern of daily miles walked over the three-week period based on the participant treatment. That wraps up this lecture. Thank you for your time and effort.