Welcome to Health Indicators and Descriptive Epidemiology. I'm Keri Althoff from the Department of Epidemiology, at the Johns Hopkins Bloomberg School of Public Health. The objectives for today, are to describe health indicators and the characteristics of good health indicators. To introduce the following descriptive epidemiology categories: person, place, and time. We will have an exercise to build skills and descriptive epidemiology. Using United States based, age-adjusted liver cancer incidence rates, we will create plots to investigate by person, place, and time. The supporting reading for today is from the US Centers for Disease Control and Prevention, this resource provides examples of describing health outcomes and indicators by person, place, and time. I find it very helpful in elaborating the key concepts of descriptive epidemiology that I will present. Let's get started with health indicators. Health indicators are important measurements that we will further describe. The origin of this statement, "What gets measured gets done", is unknown and debated, but it is applicable to the concept of health indicators. What is a health indicator? The word indicator in this phrase signals something is being measured. Here I've posted definitions of health indicator. Notice the word "Measure" appears in each of these definitions. Additionally, the definitions allude to a population. Finally and most obviously, this measurement is made within a population, it's health-related. So a health indicator is a measurement of health for our population. Now let's dig a little deeper on what a health indicator is. A health indicator represents a construct or underlying idea of health. For example, how do we know if a population is healthy? To answer that question, we may look at an indicator that is impacted by all types of diseases and injuries, such as life expectancy. We also may want to look at health indicators that are focused on some of the more vulnerable subpoulations of the population, such as young children. The infant mortality rate is a well-known and frequently used health indicator to tell us about not only the health of infants, but also the health of the overall population. The infant mortality rate is measured as the number of infant deaths divided by the number of live births in the population. It is often standardized to 1,000 live births by multiplying the rate by 1,000. But the underlying construct of the infant mortality rate is not only the health of infants, but it also lies within the idea that the infant mortality rate can tell us something about the populations overall health. The infant mortality rate may not perfectly or completely measure the health of a population, but it is a health indicator often included in a battery of health indicators that help us paint a picture of the population's health. In the US, the Healthy People Initiative sets objectives to improve the health of the American population. One of the Healthy People objectives is to reduce the infant mortality rate. As we can see in this CDC map of the infant mortality rate in the US, there are regions of the country where the infant mortality rate is higher, such as the Southern states shown here in the darker blue color. These states also have higher rates of smoking, premature death, and obesity. Infant mortality rates suggest the same conclusion as these other indicators, the population in the Southern US has less healthy than in other geographic areas of the country. Selecting the health indicator for an underlying construct of interests is truly the tip of the iceberg, we will get to that in the next section. But before we do, we need to think about measuring and analyzing the health indicator, because it's epidemiologists job to know where the data come from and how they're analyzed. The number of infant deaths and the number of live births are needed for the infant mortality rate indicator. The availability of these data must be considered, in order to calculate the infant mortality rate health indicator. Additionally, the validity of the health indicator refers to our understanding of the extent problems with the data are impacting the estimated health indicator. When analyzing health indicators data tabulation and aggregation must be considered during analysis. Let's talk more about these four ideas. The data needed to calculate the health indicator must be available if the health indicator is going to be calculated. The primary sources of data for many of the most frequently used health indicators are public and routine health information systems. Data from surveillance systems, surveys, registries, vital registration systems, medical records, and administrative databases, are all used for some of the most well-known health indicators, like life expectancy, overall mortality rate, and disease incidence rates. If the data are not available, the health indicator cannot be calculated. Because many health indicators are estimated using data collected via routine health information systems, we evaluate the systems in order to understand the validity of the data produced by the system. A health information system is considered to be producing valid data, if the data are timely, accurate, and complete. There are frameworks that scrutinize the quality of health indicators and data in routine health information systems. Things like unclear or confusing forms used to collect data, problems with pulling information from a survey or medical record, data errors, calculation errors, all of these are addressed in the PRISM framework. RHINO is another framework focused on strengthening routine health information systems in low resource settings. They have produced the data quality audit tool and the routine data quality assessment tool, to check on the validity of the data in routine health information systems. There are numerous challenges to both data availability and validity; nonexistent or weekend health information systems, maybe due to barriers imposed by bureaucratic infrastructure, a charged to political environment, unstable economic situations because these systems do require continuous financial commitment, and finally, these systems are built-in run by skilled technicians, so a lack of a skilled workforce will definitely create challenges to data availability and validation. Each health information system faces its own barriers in its own context. I encourage you to rely on the PRISM and RHINO tools to evaluate the routine health information systems in your community, to ensure timely, accurate, and complete data that are needed to estimate health indicators. If data are available and valid for the health indicator of interest, tabulation and aggregation are two key concepts to keep in mind when analyzing health indicators. First, tabulation of the data refers to how you are estimating that indicator over time. You will want to be sure the definition does not change over time, and definitions often have many layers. The first layer may be how you would explain the health indicator to a community member. For example, you may say that the cancer mortality rate is the rate of death due to cancer. Then next layer of that definition, is being more specific about numerator and denominator, which is something that's more likely to be of interest to your public health colleagues. For example, is the number of cancer deaths in the numerator identified by cause of death on the death certificate? If so, which ICD9 or 10 codes are used to identify a cancer deaths specifically? Is the denominator the estimated population in each calendar year of interest? Finally, the third more technical layer is important for the person actually calculating the estimate. For example, was the analysts looking at the underlying cause of death when tabulating the numerator? Tabulation refers to all of these layers as the construct, the definition, and the technical details are all important when tabulating the health indicator. Luckily, there are standard definitions for many health indicators that are endorsed in use by important international health organizations, like the World Health Organization or WHO. Building on the example of the cancer mortality rate, mortality rate specific to a disease are regularly estimated in most populations. Cause-specific death rates are very important health indicators, because they drive policymakers to invest more resources and understanding causes and prevention of more burdensome diseases. Standard definitions must be used when looking at trends across calendar time and across populations, if the health indicators are not measured consistently across time or populations. One cannot say a change over time or difference between populations can be attributed to other causes, and tell the health indicator is tabulated consistently across time or population. Finally, data aggregation. Data aggregation refers to how any groupings applied to the indicator must be applied consistently to the numerator and the denominator for health indicators that are proportions or rates. For example, if you are looking at the mortality rate of female breast cancer, the number of female breast cancers would be in the numerator and the denominator would be restricted to females. This is not so challenging when you have individual level data, meaning that you have one or more rows of data per person in your dataset. The challenge comes when you do not have access to individual level data. Often, protecting the privacy of individuals in the dataset, means that the group or agency responsible for collecting that data will only share grouped or aggregated data. In this situation, you will need to request the data with specific aggregations, particularly, if you are going to merge that data with another source. For example, if you were estimating cancer mortality rate by age, you may use vital statistics data for the number of cancer deaths within a specific age group. Those age groups need to match the age groups and the aggregated data available from the source of the denominator. For example, census data. It is important to note, that if you are looking at a rare disease or a highly stigmatized disease, there may be policies and laws that prevent an agency from giving data within the groups who use specified, because doesn't have a high enough count in that cell. For example, if you want to know breast cancer cases among women ages 20-24 in a small community, the agency who has those data may tell you that they can only give you the age group of 20-29, 10 years instead of five. Because there has to be more than 10 breast cancer cases in each group requested in order to protect the privacy of these young females with breast cancer. Agencies who hold this data often have their own minimum number of requirements, 10 may be it, it may be some other number which is represented here with the letter x. So data aggregation is simple in concept but challenging in application. The take home message is that, your groups must be the same in your numerator and denominator, and if they are not, you must combine groups together in either the numerator or denominator until they match. Now that we know what health indicators are and four important characteristics of health indicators, we will discuss selecting the best health indicator in the next section.