Hello everyone. Welcome back. In this lecture, we're going to talk about how to measure processing quality for design survey data. Back to our big picture of the total data quality framework. Now we're moving on to the measurement dimension of processing. Now we're going to the actual data values that were recorded in response to different variables or different items in a design data collection. We're going from the reported data values to the actual edited data, so the data that we might ultimately use for data analysis. Thinking carefully about how we can measure the quality of the processing step where we go from the data collected to the actual edited data that we use for our data analysis. That's going to be our focus in this particular lecture. With design data, how do we go about measuring the quality of the data processing? When human beings are coding open-ended responses in design data collections, one popular technique is to use multi-level models. We talked about the use of multi-level models for looking at interviewer effects as a form of data origin quality. When coding is part of our processing operation in a given design data collection, we can use the same types of models to estimate between coder variants in commonly coded indicators. If we have multiple people processing the open-ended responses, we can look at the between coder variance in terms of the indicators that they're are ultimately coding and editing for the final data set. There's an example of this by Pam Campanelli and colleagues in this 1997 book chapter, which you can read a little bit more about in the references. A way we would do this is we would fit a multi-level model to a given coded indicators. For example, was an open-ended occupation coded as blue collar or not? That indicator that was coded by our human data coders could be a one for yes, and a zero for no, whether it was a blue color occupation. Then in that model, we would include random effects associated with a coder ID variable in the data, much like we talked about with using an interviewer ID variable when looking at interviewer effects. In the same way we talked about with our example of looking at interviewer effects, we want to see if the variability of the random coder effects is greater than zero. Are different coders ultimately coding different types of responses on this variable of interest. Again, this is much like our example of looking at interviewer effects. Another way that we can look at the quality of the data processing in the case of designed data, is to compute the reliability or the agreement of edits or codes that are entered by independent editors or coders, starting with the data that were actually collected in the survey, and again, going to that step of editing the data and producing final values for analysis. As an example, you might have two human coders that are coding the open-ended reports of occupations into an indicator of being a blue collar job. In this case, the two human coders are coding that same response for the same survey respondent. We're getting two codes of that open-ended report for the same survey respondent. Given these two codes from the two independent human coders, we can then compute the Kappa Statistic as a measure of agreement between the two coded indicators. A higher value on this Kappa Statistic would suggest that the two codes are more reliable in general. This is known as intercoder or between coder reliability or agreement between different coders. This is a very important quality metric when we talk about processing open-ended data in design data collections. For design data, data processing also includes editing of recorded values on survey variables. For example, is the recorded age between some minimum and maximum allowable values on that particular variable? The minimum and the maximum might be 18 and 120 for a variable like age. The question is whether or not there is sufficient editing of the values on the variables of interests or if some highly unusual values are getting through the editing process into that final data set. Data processing could also include what's known as imputation of missing values for selected survey variables. When we talked about data origin quality, we talked about the importance of assessing missing data rates on individual survey variables. One important processing step, given the presence of missing data on individual variables is known as imputation. This is the idea of predicting the values that should have been observed based on the values of other variables that were actually observed in the data set. In other words, we're filling in those missing values for particular variables in the data set based on predictions of what those values should be according to relationships with other observed variables in the survey data set. The processing stage includes both editing of recorded values and also imputation of these missing values. It's important to pay attention to the quality of each of these different steps in the processing stage. Processing could also include construction of survey weights for probability samples. When we're working with design data collections and we've selected probability samples, the data producers need to include survey weights in the data files to allow those survey respondents to speak to a larger population that we might be interested in. The weights in the data set allow data analysts to compute weighted estimates that reflect what the population should look like based on the probabilities of selection for the different respondents to the survey. We're going to be discussing imputation when we get to the missing data dimension on the representation side, and we'll be discussing the use of weights and analysis when we get to analysis. But these are very important processing steps that allow us to measure the quality of the overall data processing. We'll talk more about these different steps in examples. Simple quality indicators given the processing steps that we just talked about at the editing stage include error rates for each of the different variables. What fraction of variables for a given case have out-of-bounds values? Then average error counts for each data record. Are we seeing that a lot of different data records are failing these edit checks given the possible values on the variables of interest? We can compute those average error counts and error rates for individual variables, or four cases in general. We could count up the number of error counts for a given survey respondent and then calculate the average number of error counts. Again, see if those errors vary across different coders or different aspects of the processing step. We can look at variability in those error accounts across different survey respondents. Typically, in computer-assisted data collections, editing routines are built-in to the overall data collection process. As survey responses are recorded in computer-assisted data collections, there's automatic editing routine so that human beings don't have to do that editing after the fact. This post-processing examination of edits and error rates often leads to error rates that are very small because a lot of the errors in the recorded responses are picked up by the computer-assisted data collection system and corrected on the fly when the data are being collected. Now, for manually entered data where there's not this automated computer checking of the recorded values to particular variables, double data entry is a technique that's often used with subsequent file comparison. The question is, given these different error rates of all the possible data values, what fraction of data values did not agree between the two data files? If one human being coded a given mail survey, for example, and an independent person also coded that exact same mail survey, all the respondents, if you stack those two files on top of each other, are there any disagreements in terms of the coded values in those two files? What is the rate of disagreements between those two coded data files? That's another very important quality metric in terms of the processing stage when double data entry is used to enter the values that are recorded in a survey. Double data entry is a very valuable tool for assessing data quality at this stage. Now, what's next that we've talked about some of these different metrics for looking at processing quality? We're going to go through two examples of analyzing real survey data and computing and interpreting measures of processing quality for designed data. Then we're going to turn to a discussion of measuring processing quality for gathered data. Thank you.