The purpose of this course is to teach you the basics of research, so you can be a more critical consumer of the research in positive psychology and think about how to use and apply it to your work and your life. This week, we're going to be talking about data collection methods. And, in particular, look at the strengths and some of the limitations with how data is collected in the field of positive psychology. The videos for this week will be broken up into three parts. The first video will provide an overview of measurement, more generally. In the second video, we will talk about quantitative methods of data collection. And in the third and final video, we'll talk about qualitative methods of data collection. For both of these forms of data collection, we'll talk about strengths and weaknesses, and how you can apply this learning to the interpretation of positive psychology articles or to the development of your own measures. Even among scholars, there are varying perspectives on the quality of data collection methods. Some scholars, like Kelvin, believed that measurement and in particular numerical measurement was essential to success. As you see depicted on this slide here, Kelvin stated that when you can measure what you are speaking about and express it in numbers, you know something about it. But when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind. It may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be. In contrast, Einstein does not place as much stock in measurement as Kelvin. As he shared, Not everything that counts can be counted, and not everything that can be counted counts. This is particularly the case in positive psychology, where we want to measure outcomes like happiness, well-being, and grit, for example. These aren't always easy outcomes to quantify, but it doesn't mean that they're not critically important. So I'm not sure what Kelvin would say about this challenge. When you're measuring constructs in positive psychology, you're trying to find ways to operationalize them. Operationalization is just a fancy word for defining a variable in a way that is measurable. So in the case of self control, for example, there are many different ways you might operationalize the variable. You might have a self-report questionnaire where individuals reported on their level of self-control. In the case of students, you might have parent or teacher ratings where parents or teachers report on kids' level of self control. Or you might decide not to use reported measures at all. But to actually design a task for individuals to perform to demonstrate their level of self control. So, for example, take the delay of gratification test. A form of this, the marshmallow experiment, is a famous study of this concept conducted by Walter Mischel at Stanford University. In the 1960s, a group of four year olds were given a marshmallow and promised another marshmallow only if they could wait 20 minutes before eating the first one. Some children could wait and others could not. The researchers then followed the progress of each child into adolescence, and demonstrated that those with the ability to wait were better adjusted and more dependable, determined by surveys of their parents and teachers. And also scored an average of 210 points higher on the SAT. Where I think that Einstein had it right, is that there's no perfect measure for many constructs. What you're doing when you're measuring in positive psychology is trying to reduce errors so you can be more confident in your results. There are two types of error, random error and systematic error. Random error is unavoidable, there are unknown or unexpected changes associated with measurement. So perhaps a child happens to be sick on the day of the marshmallow test. This couldn't have been anticipated, but it could affect the results. Or perhaps parents are given the survey on their child's self control on a day that they are particularly frustrated with their child. And so they don't report their typical behavior. It's also hard to ask questions about self control in a really precise way. Which means that anyone self reporting or reporting about others' self control may have inconsistent responses. This type of random error is unavoidable. On the other hand, systematic error is when there is something wrong with the instrument or the data itself. So maybe, for example, the questions on the self control self report scale are worded differently than on the parent or teacher reported scales. Which might systematically lead to different responses. Or perhaps the literacy level of the students taking the survey means that they don't understand what it's asking. And then they don't answer the questions as intended. Or, in the case of the task-based measures, like the marshmallow test, perhaps the experimenter reads the instructions wrong to the kids or reads them differently for different groups of kids. These are more significant types of error, because we know they bias results, and they can be better controlled for in advance. Remember the term that we learned about last week, validity. You'll recall that validity is that fancy statistical word for truth. It tells you how much you can trust the results of a particular study. And specifically, internal validity captures whether the study can demonstrate that the independent variable, or the predictor, causes the dependent variable, or the outcome. One important piece of internal validity that we didn't talk about last week is the quality of the measures. Is the measure actually capturing what it's intended to capture? And there are three different types of validity to consider. In the case of positive psychology, the most typical type of measurement validity we consider is construct validity, or whether a construct captures what it's intended to. So, in the example of grit, if we're asking questions about people's passion and perseverance, are we getting information on grit? Or could we also be getting information on some other measures, like their leadership ability or more general conscientiousness? So when you're reading a positive psychology article, see if the authors included the actual questions they asked on particular scales they developed. And see what you think about how valid they are. Do you think they actually captured the construct or something else? The second type of validity is content validity. This is typically used when you think about assessments. In other words, did the assessment capture the right content? So, if we gave a test at the end of this MOOC, for example, on all the key concepts you will have learned, did we include all of the relevant content from each week, or did we miss critical objectives? And the final type of validity to consider with measures is their predictive validity. This is not something that you can know about in the moment, but it's something that you can set up over time to learn about, whereas research on a particular measure grows, we build this knowledge base. Predictive or criterion validity tells us if a measure predicts an outcome that we care about. The fun example I always share, is whether or not passing the driving test is an effective predictor of whether you can actually drive well. Now, I personally aced my driving test, but my family and friends will tell you that I'm not a particularly good driver, and I'm a terrible parallel parker. So, I don't think that passing the driving test is a very valid measure of my driving performance. In the case of positive psychology, we might be interested in questions about whether short term measures predict long term measures of a particular construct. So for example, does students' level of grit predict whether they persist in challenging activities? Does whether they graduate from high school actually serve as an effective predictor for how well prepared they are for college? When you're looking at measures in a positive psychology study, you need to think about how valid they are, because that will tell you how valid the conclusions of the study are. Reliability tells you how consistently you can measure something. Just as was the case with validity, there are a few types of reliability. First, internal reliability tells you whether the measure consistently captures the same results within that same measure. So for example, if there are eight questions on grit on a survey, how consistently do people answer those eight questions? You wouldn't necessarily expect them to all be the same, because there's various dimensions of grit. But you wouldn't expect the results to be that different, otherwise, the eight question measure of grit wouldn't be a very reliable way to capture information on an individual's grit. Second, test-retest reliability tells you whether the measure consistently captures the same content overtime. So if I take the grit scale today, and then again in a month, how consistent are my responses going to be? They may change slightly based on context, which we would expect. But we wouldn't expect them to change that significantly in a short period of time, without any form of intervention. Otherwise that wouldn't be a particularly reliable measure. And finally, inter-rater reliability. In the case I shared earlier about self control, perhaps the teachers and parents rating the kid don't have the same perceptions about what self control is. So they would rate the kid differently. To increase inter-rater reliability, you need clear definitions and frames of reference in the survey questionnaire, and perhaps even training. Now Einstein was right, that there's no perfect way to measure some constructs, but some methods are more valid and reliable than others. In the next video, we'll explore quantitative methods for data collection, and then in the final video we'll look at qualitative methods. Quantitative and qualitative methods offer a trade off between breadth and depth. And between generalizability in targeting specific populations. Quantitative methods allow you to collect data from a broader population, which can be more generalizable outside of just the data you collected. For this reason, quantitative data is most helpful in assessing trends. However, it doesn't tell you why or how a change occurred, which requires more in-depth analysis. Qualitative methods complement quantitative methods by providing more detailed descriptions of phenomenon. Let's turn now to look at quantitative methods in more detail.