Hello and welcome to this video on primary and secondary quantitative data analysis. This is a continuation video on inferential statistics. In this video, we will look into the ANOVA and the linear regression test. You may refer to the previous video to learn about the other popular tests used for Social Science Research which include, the Chi-square test and the T-test. The ANOVA test is otherwise known as the analysis of variance. This test compares, if the means scores on an interval or scale type of variable, differ for three or more groups that are measured as nominal variables. The ANOVA test tells us whether three or more means are the same. IT-tests the null hypothesis that all group means are equal. Just like the other inferential tests, some conditions have to be met before we perform the test. Firstly, the dependent variables should be approximately, normally distributed. Secondly, you should have independence of observations. This is similar to the explanation I gave in the previous video, when looking into the Chi-square test and the T-test. Lastly, there should be homogeneity of variance. And ANOVA result produces an F-statistic. This refers to the variance between the groups divided by the variance within the groups. If F is greater than one, the variance between the groups is significantly larger than the variance within the groups. The greater part of the total variance is due to the differences between the groups or the variance between the groups. This would mean that, we reject the null hypothesis and accept its alternative. If F is less then or closer to one, the variance between groups and the variance within groups are equal. This means that the groups do not differ from each other. In this case, we would accept the null hypothesis. When there is an overall statistically significant difference in group means, or in other words, if the alternative hypothesis is accepted, you would run a post hoc test. This tests is run to confirm where the differences occurred between the groups. The last inferential test we will look into is the linear regression test. This test is performed to check if there is a significant linear relationship between two continuous variables. But before we go in-depth on the linear regression test, I would like to point out that in some cases, the regression is confused with correlation. But note that, correlation does not imply causality. Consider the image here. A correlation test informs us about the strength between two variables. This could be between two dependent variables, two independent variables, or between a dependent and an independent variable. The correlation coefficient ranges from negative 1 to 1 or positive 1. Where value is closer to negative 1, implies a strong negative correlation. And values close to 1, implies a strong positive correlation. Zero implies no correlation between variables. Causality on the other hand, informs us the extent to which a change in the independent variable would cause a change in the dependent variable. Note that, the direction is always from the independent to the dependent. Meaning, that to one units change in the independent variable, will cause a certain degree of change in the dependent variable and not dependent variable causing a change on the independent variable. Now, with reference to causality, let's assume you want to test whether there is a linear relationship between students grades and individual study time. The equation for linear regression is as shown here, where Y is the outcome we want to predict or in other words the dependent viable, which in this case is students grade. That B not or B_0 is the intercept otherwise known as the constants of the line at the Y axis meaning, the student's grades if they do not study individually or in other words, the point at which the line crosses the vertical axis of the graph. B or beta is the gradient, slope or regression coefficient. Usually, a line that has a gradient with a positive value describes a positive relationship, whereas a line with a negative gradient describes a negative relationship. And X here, is a value for the predictor or in other words your independent variable, which in this case is individual study time. With regression, we strive to find the line that best describes the data collected. Then, estimate the gradients and the intercept of that line. Having defined these values, we can insert different values of our predictor variable into the model to estimate the value of the outcome viable. Linear regression estimates the equation of the line of best fit using a technique called least squares. The least squares line is the line that best describes the data collected meaning, the line that has the smallest sum of squared vertical distances from the observed points to the line. The line of best fit is called, the regression line and it plots the best fit of all the points on the plot. There are also some conditions that have to be met with a linear regression just like with the other inferential tests. Firstly, the dependent variable should be continuous data. Secondly, there needs to be a linear relationship between the two variables. Thirdly, your dependent variables should be approximately, normally distributed. And fourthly, there should be homogeneity of variance. And lastly, you should have independence of observations. Now, the outputs of the linear regression, would inform you about the magnitude of effects, that a change in the dependent variable would have on the dependent variable. Let's look at the outputs here from a linear regression performed in SPSS. The model summary with the R-squared, is a statistical measure of how closely the data points are to the fitted regression line. It is also known as the coefficient of determination or the coefficient of multiple determination, for multiple regression. The value is interpreted in percentage where zero percent indicates that, the model explains none of the variability of the responses of the data around its mean. And 100 percent would indicate that, the model explains perfect variability of the response data around its mean. You may wonder what the adjusted R-squared next to the R-squared value means. Usually, it would follow that the more independent variables you include in the model, the higher the R-squared value gets. The adjusted R-squared, is a modified version of the R-squared that has been adjusted for the number of independent variables in the model. The value increases only if the newly added independent variable improves the model more than would be expected by chance and decreases when the independent variable does not improve the model. The next table we look at is the coefficient stable. In this table, we pay attention to the significance level and the beta, B, also known as the coefficient or the magnitude of change. The interpretation of the significance level for the linear regression is the same as for other statistical tests where a value greater than 0.05 would mean that we accept the null hypothesis and a value less than 0.05 would mean that we reject the null hypothesis and accept its alternative. In this example, the significance level is less than 0.05. Meaning, that there is a significant relationship between grades and individual study time. Now, looking at the beta value of 0.023 in this output table, we realize that the outcome is positive. If it were negative, it would have a negative sign before the value, so as to read us -0.023. Now, that we have a positive outcome, we would rightfully say that the more time a student would invest in studying individually, the higher the grade would increase by 2.3 percent. This is what is referred to as the magnitude of effect. You now have the basic idea about the different kinds of tests you could perform on quantitative analysis and also that correlation does not imply causality. This test that I discussed in this video and in the previous one are not exhaustive. There are many kinds of tests that you can perform on your data and just I simply mentioned the most common types of tests that are performed in social science research. I would like to encourage you to read more about these tests. You can find a lot of online literature about them as well as online tutorials that explain how to perform these tests. That's all for the race. This brings us to an end on this online course on research methods. I hope that you now have the basic information that you require to perform both qualitative and quantitative research. Remember that both types of research require a very keen attention to the details of data collection, data preparation and data analysis. All the best as you perform your research and hope to get wonderful results.