So analysis of variance, often abbreviated ANOVA and experimental design are important statistical tools for understanding the complex relationships between several variables. And in particular, they can help us detect group differences with respect to the mean of some continuous variable. Now the conclusions drawn based on the use of ANOVA and ANCOVA, which is the analysis of covariance, so another thing that will study in this course. They can tell us about associations or correlations between different variables in question. And under certain experimental conditions, we can use these statistical tools to infer causal relationships. And such conditions have to do with the design of the experiment, the way that the data were collected. Now, when this course will study these different methods so we'll study ANOVA and ANCOVA. So this is sort of the mathematics of detecting group differences, average group differences. And we will also study the experimental design features, which allow us to collect data in a way that can help us infer causal relationships. So to better understand the tools used in this course, let's consider an experiment that was published in the Journal of Food engineering. Now, the goal of this experiment is to evaluate and compare the differences in terms of the quality between espresso coffee made using three different extraction procedures. So this exercise has measured the foam index of a cup of espresso. And foam according to the authors of this study, is a distinctive feature of espresso coffee and it's absent from other coffee brews. And so it's sort of required for consumers to have a nice foam feature on the top of their espresso. And so they're measuring the quality of espresso using this foam index. Now, the goal of this experiment is to evaluate and compare the differences in terms of quality between espresso coffee made using three different extraction procedures. Now, quality in this exercise has to be operationalized, and it's done so in terms of what's called a foam index. Now foam, according to the authors, is a distinctive feature of espresso coffee as it's absent from other types of coffee. And so that they think that more foam leads to a better espresso. Now, in general, the basic unit of a statistical experiment is called an experimental unit. And the espresso dataset includes 27 experimental units, namely the shots of espresso. Experimental treatments are something that researchers have control over administering to the experimental units. So in this study, the shots of espresso were prepared under three different treatments, namely each extraction procedure or brewing method. And the response variable, now that's the outcome of interest for each observed unit. And in the espresso study the response variable is the foam index. So note that this study is experimental because researchers have control over the treatment, namely the brewing method, and they observe the changes in the response. Now we might contrast this with an observational study in which researchers can't control the administration of any treatments onto the units and the sample. And instead researchers collected analyzed data without changing the conditions. And of course, in experimental studies we have an easier time inferring causal relationships, which we'll talk about later in the course. And for observational studies, there are methods that can attempt to infer causal relationships, but we'll stick to experimental methods in this course. So in general, there are different ways to visualize the distribution of a continuous variable across different groups, and those methods could include box plots and violin plots. So here I'm showing a violin plot. And there are some important characteristics to a violin plot. Now first, for a fixed brewing method, one of these sort of columns in this plot, the values of foam index are plotted as these faded grey dots. Now, the black curves on either side of those dots are an estimate of the distribution of foam index conditional on the method in question. Now the large gold, grey or black dots at the center of each distribution are those represent the sample mean of the foam index within each group or method. And the lines extending outward from the sample mean represent two standard deviations from the mean. So these plots tell you quite a bit. And there are nice summary of your distribution of the response across these different groups. Now, a really important research question in this experiment is, are there differences with respect to the true mean foam index across different brewing methods? Now, this question might be important because coffee shop might adopt a brewing method different from the one that they currently use, if the foam indexes higher on this new method. And our first attempt at answering this question, it could be something like this violin plot. Now visually, we can see that the sample means we might call these X bar are different. But we also know that there's a relatively small number of measurements and this experiment there were only 27 experimental units. And so it may be the case that if we conducted this experiment again, the values of foam index across groups would be different. And so, just visually assessing a plot like this one, a violin plot or a box plot for differences across sample means doesn't really tell us the whole story about differences across population means. And so what we'd really like to know is, if there is a difference in the means across these groups in the population and not just in the sample. And to do that we need to do some statistical inference. Now to make an inference about true means, we need to take into account the variability of the sample mean across different possible samples. And in this case, different possible experiments. And we can do this with analysis of variance. And we'll learn the mathematics of analysis of variance and how to interpret the results as this course progresses. Now it's worth noting that researchers might also want to control for other variables that might affect the response. So for example, in this espresso experiment, all espressos were made with the same commercial brand mineral water. But suppose that they weren't? Supposed that there were different possible waters with different pH levels that could be used in this experiment. Well, pH levels could possibly impact the film index. And so consequently, we might add the measurement of the water pH to our experiment in the statistical model. And models that help researchers discover whether there are differences with respect to the mean of a continuous variable across treatments or groups. While controlling for other continuous variables like pH of water, would be analyzed using analysis of covariance or ANCOVA models. And we'll also talk about these later on in the course. Now to make the further inference that the difference in brewing method caused the difference in foam index, we really need to rely on further statistical concepts like randomization and replicability. Now, randomization is the use of a chance process to assign a treatment to experimental units. And randomization is a reliable method for creating homogeneous treatment groups. Now, without randomization, we might run the risk of applying the treatment to experimental units in some systematic way that would impact the causal conclusions that we draw from our experiment. So randomization really helps us eliminate that systematic way of assigning the treatment. Now, one example of this might be supposed that we use the bar machine method one of our possible brewing methods with a high pH water. Now, this lack of randomization would mean that we would have a very difficult time sort of teasing apart of distinguishing between the effect of the bar method or the pH level on the foam index. So we want to randomly assign that way we eliminate this possible co-variation. Now, replication is another important concept in the design of experiments. And replication is the process of independently assigning the treatments to several experimental units. And in the espresso experiment above, each method has nine replications. So replications show that the results are reproducible and helps researchers estimate important error terms in the model. So in the next few videos will define the one way ANOVA and and ANCOVA models. And we'll categorize them in terms of linear regression models. In fact, they really just our regression models of a certain type. And so, we'll see that the single factor or one way ANOVA model is a multiple linear regression model with two predictor or explanatory variables and a continuous response. And an ANCOVA model just adds continuous predictors to that, ANOVA model.