In this video we're going to dig into data analysis and talk about how researchers use statistics to test their hypothesis. Right now I'm imagining that most of you skipped the result section in papers because there's a bunch of numbers and charts. My hope is that by the end of this video you'll feel more comfortable knowing how to understand this information. Remember that there are two types of hypotheses and analyses, descriptive and inferential. Descriptive hypotheses are hypotheses where we're just describing trends. We're not trying to manipulate any data but just observing the sample and understanding what's happening. In contrast inferential hypothesis which we'll cover in this video a hypothesis where you're interested in the difference between groups or testing the relationship between two variables. You'll hear party talk this week about whether those who participated in his mental toughness intervention. Improved in their grit and optimism which is an example of looking for differences between groups, and you'll hear Angela she's been interested in and tested over the years. She'll talk about how grit is related to other variables, like happiness. Growth mindset and purpose. Statistical analysis can help you identify the relationship between two variables of interest or the difference between two groups. With inferential statistics we want to draw inferences about populations from samples. So why do we do this? Well we would really like to talk about a large set of people But it's not practical to survey everyone in our population. So we want to be able to generalize what we know about a small set of people to the larger group. So in inferential statistics we're really making a numerical guess about the population based on what we know about a sample. So when ever were doing analysis were assessing samples, with the hopes of generalizing to our population of interest. With this means that there's always a possibility that what we observe is do the chance and not actually a different a relationship in the population. So release statistical analysis is about assessing chance and probability. To use stats to test our hypotheses there are a few steps. First, as we've already discussed, we develop hypotheses which will either support or reject our predictions. Then we collect data from a sample and test our claims, based on this sample, using probability to say something about the situation. And we do this by using what's called a null hypothesis. Remember that the null hypothesis is the statement that an intervention has no effect on an intended outcome, or that there is no relationship between variables. The alternative hypothesis is our prediction that there is a relationship or a difference. If the data that we collect are consistent with our null hypothesis, then we reject our predictions. But if we have enough data to support our predictions, then we reject the null hypothesis. An important note is that we don't ever prove that our predictions are true. Because statistics is based on probability, there's always a possibility that the results are due to chance. So let's take an example. Let's say we implemented an intervention to increase grit with teachers. And we want to know if teacher is participating in our intervention have higher grit scores than teachers in the district who did not participate in our intervention. We would collect grit scores from these two groups and compare the results. To do this, we would test the null hypothesis that there is no difference in grit. Between teachers and our intervention and other teachers in the district. So let's say this slide represents our results. The average grits score of teachers not in our intervention is 3.5 and the dot represents the average from our intervention Group of 4.5, so there is a difference, and it appears as though teachers in our intervention have higher grit scores. But the question we would then ask is whether that difference is just due to chance, or likely a real difference in teachers across the district because this score falls far enough out from the average it could be in what's called the significance region. which is where we would think the results weren't just to the chance or error in our data collection, but actually a difference worth paying attention to. This is where the concept of statistical significance or P values comes in. You'll see P values reported in parts of psychology articles. The P value tells you the probability of rejecting the null hypothesis when it's true. So in simpler terms, it's telling you the likelihood that your results were just due to chance or error in the data collection. In positive psychology the standard that's applied is .05. So what that means is that there's a 5% likelihood that the difference or relationship was just chance or error. Anything under 0.05 is determined to be statistically significant. So when you hear that a result was significant that means that a statistical test of some sort was performed and a P value of less than 0.05 was obtained. And that there's only a 5% likelihood that these results would have been due to chance alone. So this means that the difference or relationship was different enough from the null hypothesis that we should pay attention to it. So in the case of the visual of our great intervention, the great score we observed was far enough out on that curve or different enough to not likely have been due to chance. This is where terms and statistics can get tricky because the term signficant has different meaning in common language then in statistics. In common language significant means something is large, but in statistics the term significant doesn't say anything about the size of magnitude of the effect. It just tells you whether or not the results were likely due to chance. So, let's say you have a really big sample. Increasing the size of your sample makes it more likely that your results would be replicated if you run the study again and not just stood a chance. So, if you have a large sample you might find results to be statistically significant even if they aren't very large. To get at the actual size at the difference or relationship, or the practical significance, you look at the measure called effect size, the effect size will tell you about the size or magnitude of the relationship. If you're interested in testing the difference between groups, as [INAUDIBLE] might be if he was looking at the difference between participants in his mental [INAUDIBLE] intervention in a control group. You would use one type of test or if you're interested in the relationship between grit and other variables like Angela has been over the years then you would run different types of test. This Chart outlines the types of tests and results you get from these types of tests. Now, I'm not expecting that you become an expert on these in this course, but you'll see them sided in articles. So, I do want to help you be a more critical consumer of the research. Relationship analyses tell you the association between variables. So, for example, if researchers are testing hypothesis about the relationship between grant and other variables, like happiness or optimism. Then they're either running correlation or regression in the effect size measure they would get is the Pearson's R for correlation and a beta value for regression. These measures tell you the strength of the relationship between variables. Group difference analyses allow you to understand the relationships between different groups. So if researchers are testing hypotheses about group differences say between an intervention In a control group, then they are either running a T-Test for two groups or an ANOVA for multiple groups. So think about the example I shared earlier about teachers who participated in our Grit Intervention as compared to teachers in the district who did not participate in our Grit Intervention. The effect size measure they get is a Cohen's D which tells you the size of the difference between groups. So regardless of the type of analysis, researchers are testing hypotheses and will get a p value. Which again, remember, tells them whether their results are due to chance or likely represent a real difference in the population of interest. These effect size measures assess the magnitude of the effect, or the practical significance. I'm going to give you some tips on how to interpret these measures, if you see them in articles. This slide shows how to interpret the values of R or D. And our value of 0.1, or -0.1 is small, a value of 0.3 or -0.3 is medium and 0.5 or -0.5 is large. Calls in D which tells you the size of the difference between groups doesn't have the same limitation in range as R. The calls in D of 0.2or -0.2 are small 0.5 or -0.5 is medium, and 0.8 or -0.8 is large. But D can be over 1. So I know I'm throwing a lot of numerical values at you, so let's actually look at an example. Here's a table from the True Grit article. This table is showing the correlation or relationship based task where we looked at how grit was correlated with many other variables. If you look across the first row in the table you'll see the values reported for the relationship between grit and all other variables we tested. The values represented here are the Pearson's correlation coefficient, or that r value I told you about so what you'll see is that most of the correlations are very small with R values of 0.02, .16, and 0.808. The one exception is the relationship between Grit and Leadership which has an R value of .36. Now, remember our rule of 0.01, 0.03, and 0.05. This means that there's a moderate positive relationship between Grit and Leadership such that greedier teachers also had higher leadership scores. The second thing you will see in this table is the P value or the likelihood that results are just due to chance, the stars let you know that the results have been found to be statistically significant,the more stars the more significant and the less likely the results were just due to chance. So, if we are looking at what variables are significantly related to grid. The only variable is leadership because that's the only variable where there are stars by the value. You can see at the key at the bottom of the table that the three stars mean that the P value is less than 0.001. This means that there's a less than a 0.1% chance that the results were just due to chance. So, we can conclude from this table that there's positive moderate relationship between grit and leadership. That is unlikely due to chance, or just an error in our data collection. Let's look at another example, this time where we're testing the differences between groups. In this table from true grit, we're looking at the difference in grit and other variables between teachers who were retained for the year, and those who quit mid year. Our hypothesis was that teachers who’s stocked with it through the year would have higher rates of grit than those teachers who quit. And the results actually provided evidence to support our prediction. So, if you look at the first row of this table, you’ll see the difference in the grit rating between retain teachers and those teachers who resigned mid-year. The mean or average of Grit for retaining teachers was 3.98 compared to the mean or average of 2.79 for teachers who resigned. The Cohen's d was last as I remember the size of that difference. The the rule of 0.2, 0.5 and 0.8. A Cohen's d of 0.79 is a large effect because it's close to 0.8. Indicating that the size of the difference in grit between teachers who are retained and those who resigned was practically relevant. The fact that there are threes stars means the results are very unlikely to be dued a chance. You can see that the P value is less than 0.01, which again means that there's less than a point one percent chance that the result are a fluke. So, we conclude from this table that there is a large difference in grit between teachers who stayed through the year and those that quit mid year. That is unlikely due to chance or an error in our data collection. So in sum, I hope that you now won't just skip over the result section of positive psychology articles. When you're looking at tables and articles, you can look for three things. First the P-Value or statistical significance. This will tell you how likely it is that the results are due to chance. If the P-Value is less than .05 you'll see stars represented which means you can have competence that is unlikely that the results are due to chance. Second the effect size. We've talked about two types of effect size. Cohen's D and Pearson's R. But, there are other effect sizes you'll see reported in articles depending on the type of test that is being run. The effect size measures tell you how practically significant the results are. And finally, always look for the sample size and information on the sample. This would tell you how generalizable the results are to other population. So, this brings us to the end of the final video on analysis. There are a few other components for this week, as usual. Again, researcher videos, featuring Angela Duckworth applying these concepts to her work. Practitioner videos, featuring practitioners applying these concepts to their work. A short quiz on key concepts and two extension activities. A peer review activity where you discuss what type of analysis you would run to test the hypothesis you submitted in week one. And in advanced activity, where you look at table four in the true grid study. And interpret our results.