Episode three. What makes a good measure? Measures and measurement for clinical purposes have been an integral part of healthcare for decades. From early efforts to determine arterial blood pressure using a blood pressure cuff to recent efforts to finding ways of detecting early lesions on brain tissue using CAT scans. By contrast, the use of measures and measurement to determine quality in health care is much more recent. It's become wide spread only in the last two decades. As with every area of measurement, there are quality measures that are useful and others that are misleading. So, what makes a good quality measure? For virtually every type of quality measurement in every measurement situation there are a few critical questions that underly the determination of whether the measures will actually be useful. Fortunately, there's been some solid work done to explore this topic, and several sets of criteria have been developed to help us determine what constitutes a good measure. The set of criteria that are listed on the U.S. National Measures Clearinghouse site build of many earlier efforts and are close to consensus standards. There are basically three questions that need to be considered when selecting a measure for use in practice. One is, is the measure important? Two, is it scientifically sound? And three, is the measure actually feasible to use in practice? Let's take a closer look at each of these criteria. First of all, importance, the measure must reflect an issue that is first and foremost important to all those who'll be using or be affected by the measure. This usually begins we have importance to patients that extends to all stakeholders including you and your practice, and in cases involving external reporting or use of measures to insure payers and regulatory bodies. The second aspect of importance is related to the impact that the disease or conditions have on health in general. If the condition cause a substantial burden to patients in terms of pain, disability, or death, and it's at least fairly common in some practices. In addition to determining the impact on the overall health in population, this will be an important consideration when we look at data collection and determining the visibility of the measure. A third element of importance that is coming to the fore, as we recognize the large health disparities that exist in many areas, is whether or not the measure is useful in documenting differences in healthcare status or outcomes between populations with different health literacy, socio-economic status, race, or other factors related to equity. Importance of a measure is also a dependent in your ability to show that there are substantial variation in the data obtain from using the measure, and with the average performance on a measure indicates a substantial room for improvement. More over, a measure also needs to be linked to one or more interventions that can be carried out within the health care system, and that have been shown to substantial improved performance. Let's look at a brief example of assessing the importance of a measure. The American Diabetes Association has estimated that the cost of medical care for diabetes in United States alone to be about $176 billion dollars per year. The number of people with diabetes worldwide is estimated to be over 380 million people. And there's also ample data that poorly controlled diabetes leads to blindness, renal failure, and accelerated cardiovascular disease, and premature death. There's evidence of major disparities within the care of patients with diabetes. And a large literature documenting variation and a degree of control of parameters such as blood pressure, cholesterol, and blood sugar levels. Finally, there is research showing that control can be improved by various interventions including more consistent and reliable care, and the better control has substantial effects on outcomes, such as renal failure or heart attacks in this patient population. While the importance of diabetes as a focus for measurement seems obvious, there are unfortunately measures that have been developed and proposed for general use that focus on relatively rare conditions. Or conditions that result in limited harm to patients, or in instances where there's virtually no variation in performance, or an average performance that is already near 100%. A second core attribute for useful measures is that of Scientific Soundness, measures clearly need to be based on solid scientific evidence, which shows a very close link between the measure meet a structure process or outcome measure and in a chain of reasonable clinical logic. The clinical logic needs to link the types of health conditions seen in the clinical setting to the outcomes of care. For example, there's clear clinical logic linking doing a foot exam to documenting an arthropathy to providing the patient with instructions about foot care to potentially reduce the likelihood of lower limb loss in diabetes. Professional societies researchers and others involved in a measure and guideline development are engaging in efforts to evaluate the science that is useful in creating a measure. In the US, there is a further step in that the National Quality Forum reviews all aspects of measure development, including the strength of scientific evidence submitted by measure developers who are seeking NQF endorsement for their measures. While most of us do not have the time or the inclination to delve into the scientific research related to a measure, it is still important for us to have some general knowledge of what to look for, in terms of scientific soundness. Since unfortunately, there are still many examples of measures in use that have a weak or highly questionable science base. Our first place to start is to see if the scientific evidence base for the measure is clearly noted in the measure description, or in specifications posted by the measured developer, or by the national quality forum, or national measures clearing house sites. The latter is easily searchable and is in the public domain. Most well documented measures will have a clear explanation or even a numerical classification of the strength of evidence underlying the measure progressing from expert opinion to case control studies to one or more small randomized control trials. To multiple congruent large randomized control trials at the highest level of evidence. There should also be information about how the validity of the measure has been determined, and if testing to determine the reliability of a measure has been done in one or more settings or with different populations. Because reliability and validity are such important concepts, we'll digress briefly to more carefully define reliability and validity. Reliability refers to the property of a measure that when the measure is applied in most situations it will produce results that are consistent between similar situations and settings, in respect of who does the measurement or when it's done. In other words, when we use the same measure in two similar situations or within the same situations at different times, we sure get the same or close to the same results. To use an illustration, using a yard stick that shrinks or expands a great deal with changes in temperature would not be a reliable way to measure and compare the length of a board between the arctic and the tropics. It should be noted that even measures that have been shown to be fairly reliable in one population, may lose reliability If they're used in populations, they are quite different. For example, health survey using questions set at a sixth grade level maybe showing to be very reliable in the adult population most areas of the US, but will likely have very low reliability if used in rural areas in developing countries with very low educational levels. As we note later when we talk about measurement itself, reliability should be examined within the setting and the population to which its actually applied, but that's frequently not the case. Validity referrers to whether a measure actually gives the information that it purports to give. In other words, does the measure we are using actually answer the question that we think we're posing. A valid measure is one that gives us direct information about what we want to know, that the findings represent what we intended to measure. For example, a measure that allows the use of a pediatric blood pressure cuff in trying to estimate blood pressure in obese adults would not produce valid results. Another factor that's considered in determining scientific soundness is whether or not the measure is structured in such a way that can address the case mix, or in other words, adjust for factors that strongly influence the outcome of the measure, but are outside the control of the health system itself. A process measure like measuring blood pressure yearly in patients with diabetes is not likely to be strongly influenced by health literacy, social economic group, or age. Although there may be some minor effect, however there's ample evidence that health literacy, socio-economic status and age do affect efforts to actually control blood pressure in populations of people with diabetes. Obtaining data to make such case makes adjustments and the advisability of doing so or not doing so, is a topic best left to measurement bureaus and over sight groups, but all users of measures should have some basic understanding of the issue of case mix. Measures also need to be understandable to those using the measures. If the measures only for internal quality purposes, then it needs to be well understood by the providers and staff. But if it's going to be reported to the public, it obviously must meet a far higher standard for being understandable. For example, a measure of ejection fraction in patients with congested heart failure might be of use for cardiologists doing an internal improvement project. But without some careful translation would be meaningless to patients in a public report card on congestive heart failure. Even widely used measures like hemoglobin A1C in diabetes are of uncertain utility unless they're carefully framed as perhaps a number that shows how well your diabetes is being treated as part of the information that's reported. A final desirable attribute for a measure relates to feasibility. That is, can the measure actually be used in most common settings and situations? This is a frequently overlooked, but very critical factor in choosing a measure. In looking at feasibility, the full description and specification of the measure should be readily available. It's especially important that there is a full and complete specification for the numerator, in essence what exactly satisfies the measures as being met, and the denominator which is, what is the population that is to be included in the measure, and either explicitly or implicitly who is not included? For example, what appears to be a simple measure of blood pressure controls requires detailed specifications regarding precisely what blood pressures are included. Is it any blood pressure? The last blood pressure? One done in the office, one done at home? And for the dominator precisely who's defined as been hypertensive. Any parameter that is not clearly specified is a likely spot for error. For instance, in looking at a semiannual mammography in women 50 to 65 to use a woman's age as of January 1st till they were 40, or is it when the exam is done? Is the interval between exams precisely 24 months, or is there some leeway given? Most organized programs of measurement have found it very helpful to provide technical assistance to those doing the measurement, and often find it necessary to audit the results for accuracy. Especially if the measures are being used for payment or public reporting. It is also critical to know precisely what data is needed in collecting a measure. In our blood pressure example, is it only from the primary care office or all providers who see the patient at a given time period. Does it include home monitoring either by the patient or using some other medic device. While much of this gets into the weeds of measuring, if the way the measure is written and specified is imprecise, or fuzzy it will be impossible to get useful results. Finally, for a measure to be feasible, data collected for measures also must be reasonably accessible, complete and collected at a reasonable cost. All data needed to define the measures numerator, denominator, time frame, and collection process, must both be easily or at least reasonably accessible and complete. Data that is spread through multiple charts in different parts of the chart or in different data fields in electronic medical records, make it time consuming and sometimes costly to retrieve the data. Data that are in part missing or coded differently may make measurement difficult or almost impossible. And will certainly effect the reliability and validity and add to the likelihood that the information obtained will be unusable. The incredible promise of measures being easily extracted through the use of electronic health data has been slowed or halted by problems of data being filed in different records with different standards, and parameters filed in different fields with different labels were missing all together. However, even though progress is slow, they'll continues to be the promise of easily retrieve measures using the electronic data. Finally, there are some practice issues that may affect your ability to use in giving measure or set of measures. One factor to consider is what the potential impact of the project may be on those who will have to participate in the measurement effort. Will the measure add more work to the practice, in terms of new requirements for collecting or entering data? A lack of enthusiasm, or even active opposition of the process Can create more of a problem in chosing the wrong measure. Will the fact of collecting data affect the behaviors of providers or others? Will these issues in any way interact with the measurement itself? As you can see from this episode, understanding what makes a useful measure can be as somewhat complicated process. However, with some reasonable attention to the criteria for useful measure, there are usually many new choices for measures. An often quoted mantra in measurement is simply don't let the perfect be the enemy of the good.