Hi again. In our previous lesson, we covered data validation methods. In this lesson, we'll expand on that to introduce a framework for validating and verifying data. We'll review the distinction between validation and verification, comparing and contrasting the two. After this lesson, you'll be able to recall and explain three concepts that we're checking when we apply validation or verification of data. You should also be able to discuss monitoring and maintaining data quality throughout the data lifecycle. Let's get going. Let's review the distinction between validating and verifying data. Validation focuses on alignment of data with values and standards that are broad and external benchmark. Verification focuses on how data values match our understanding of local practices and what we do locally. Using these contexts, gives us some ability to see where our data is of high quality and where we are recognizing issues. Looking at the local knowledge, verification tells us when our data is changing because of local reasons. The focus on our own practices, on our own community, starts with a basis of understanding that allows us to notice when those communities are changing, and when those populations are doing something we don't expect. Validation, focusing on external benchmarks, gives us more sense of how our data fits relative to broader standards and pieces that are more scientific. Verification focuses on places where we can line with local data. Gives us a sense of how our data aligns with what we already know and understand. For example, if we're thinking about gender, we can look at the value of gender and know that gender will be either male or female. If we're talking about biological gender, we'll know basically what our patient population looks like locally and we'll be able to see how the data varies from what we would expect. We can talk about regional concepts. Gender identifiers will reflect either male or female. Sometimes, these are one's or two's as they are recorded in the database. We can talk about it logically and computationally and see that only these values are present. If we see other values, we'll know something is wrong. If you're checking gender identifiers, expecting one's and two's, if you come across six's and seven's, you'll know to check to figure out why the values are out of range. In dealing with this, we can also enforce completeness to say that we know that someone must be either biologically male or female. Perhaps, that would be recorded on the basis of a genetic test. We'll think about these things like uniqueness of quality. At one point, for an individual patient, they would be either biologically male or female. We know that this is something which would not change over time. That temporarily, we will be able to see that the data captures would be the same whenever we captured that data. Looking at it from a validation standpoint and in the context of external benchmarks, we would expect that gender concepts to be basically male or female. We would know that, relationally all of our values are consistent across all our datasets, and that we would want to know that the distribution of male and female within the population will not change significantly over time. There are three underlying concepts that we're checking when we apply validation or verification. They are conformance, completeness and plausibility. Conformance is the principle that we expect our data to conform to the standards that are established in our community. When we're talking about conformance in terms of data, does the number makes sense? For instance, if our data were to give us the value of a baby's weight, we might expect it to be in the range of two to 15 pounds. If we saw a value showing a baby's weight has 5,000 pounds, that would not conform to establish norms. Are we talking about a 5,000 pound baby? That number doesn't make sense. However, we can start to analyze our data and see if it fits in the category in another way that would be appropriate. Perhaps, the baby is being weighed in grams. Another underlying concept is completeness. We check to ensure that our data is present and we don't have missing data where we don't expect missing data. Finally, plausibility as does the data makes sense? You're checking whether it is consistent with other facts on record. Plausibility would be things like, do we expect data to be unique and not changing over time, like biologic sex? Or is it something that we would expect to change over time like age, and age is something we would expect to change consistently. Well, a person's weight might be changing unpredictably over time. Data quality is a principle that needs to be designed into computer systems as we build them in our software as we design it and as we create quality management programs, we look at these aspects and build checks to make sure they happen and that these are characteristics in our data. Even as we go back up the pipeline, back through the data lifecycle, when we build computer systems, we often do validity checks on the data to ensure we are collecting data correctly as we collect it. To make sure we are capturing our data accurately so that it will be useful later. On that line, you can see that data quality is really something we bake into our systems from the beginning. From this standpoint, our plans to do analysis is something we begin with and plan through the entire data lifecycle, through our creation and integration of computer systems that we use to capture data. In this module, we've talked about some metrics of data quality and how we can assess that. That brings us to the end of this module, and we'll look forward to seeing you in the next one. Thanks.