Welcome back. In this lesson, I will talk about data from a central electronic health record. These are often called electronic health record systems or EHRs. The main purpose of these systems is to document and to manage patient care. Yet, you will soon see that organizations have different types of EHRs and there are numerous types of data contained within these systems. Once the lesson is complete, you will be able to list numerous data types that are found within EHRs and link these to specific clinical processes that created the outputs. Let's start with the question, where is EHR data collected? Clinical health care data can come from physicians offices, hospitals, ICU settings at a hospital, the operating room, recovery rooms or the emergency department. It may also come from the patient's home. Patients may have devices that they take home. Glucometers, for example for diabetics. Home health which makes visits to homes also record data in their own systems. Then nursing homes or chronic care facilities often have their own separate documentation on the patient. Overall, the main objective of collecting data is for the provider to have a record of what conditions and interventions patients experienced. Most healthcare organizations now have some sort of Central Electronic Health Record. Yet, a few do not. Of those organizations that have electronic health records systems, the system setups and configurations are often varied. Organizations such as hospitals can have a varying degrees of administrative systems that they use for other business needs. Some health care organizations do not have essential electronic health record but they may have electronic data repositories based on other aspects of business processes. These can all be leveraged in mind for data. Among healthcare organizations with EHRs, there are degrees of EHR adoption. Not all electronic health records are the same, not all organizations implement the same electronic health record systems in the same way. These all have a dramatic impact on how analysts can gather data from these systems for secondary use purposes. It's important to think about some of the dimensions of electronic health record data. Data that's gathered during a patient's encounter is considered synchronous data. This is where the clinician is documenting in the electronic health record at the time when they're working with the patient. This data will immediately be available for patient care but the reporting systems may have to wait up to 24 hours or more to receive this information. Next, there's also asynchronous data. An example of this is when a patient is in a clinic and a test was ordered. The test might be sent out to a lab and the process completes independent of the clinical encounter. This happens with laboratory data, pharmacies and imaging data. There are a few results that happened after the patient has left the clinic. Some additional points to think about is whether the data format was free text entry. For example, a clinician types of value into the electronic health record. This manual data entry can increase the likelihood for error. Somebody could misspell or inadvertently transpose two letters and a computer won't know the difference between those. In addition to errors, this data is notoriously difficult to aggregate across patient populations. Aggregation can be done with natural language processing of the text. This requires however, appropriate algorithms and technology and if done incorrectly can be very error prone. Thus, the preferred method is through structured field entry. Structured fields can be select lists. We've all seen these online. You go to enter in a credit card and they let you select the month of the expiration date and maybe the year. These are structured fields where you can't enter a wrong value. You may enter a value that is inaccurate but you can't enter values outside the range that's provided to you. This does improve data quality but it can be more time-consuming for clinicians at the point of care. One of the goals in informatics and IT, especially clinical IT, is to reduce the workload on the clinicians. As an example, in EHRs demographic information is usually structured and as a result it is often of high quality. However, due to the complex ways that people think about and report the racial and ethnic identities, the data are often variably reported and frequently incomplete. Another dimension of data is system generated results. Laboratory results for example. Laboratory results come from laboratory devices through which the sample was run. These can be large datasets or a single value. The data may be very specific to an institution or to a particular lab. Some generated data may require substantial processing to become clinically useful. For example, genomic data is a massive volume of raw numbers and values that need to be substantially processed and analyzed to come up with any clinically relevant results. Now let me describe more types of clinical data. First, medication lists are often structured. However, these lists typically are encoded using an EHR vendor or a vendor who curates and processes drug data. For example, there is a commercial data curation vendor called, First Databank. There is also a uniform open and free coding system for medications called RxNorm that I mentioned in the last module. Next, there are procedure orders which are often structured but again, encoded with proprietary vendor code lists. This makes them very difficult to aggregate across multiple systems. Laboratory results are often the best structured data in the record because these are usually encoded with propriety vendor codes. There is also an open source coding system that is used for observations that I discussed in the previous module called LOINC. The use of LOINC is scarce and health IT groups mostly use it as an interoperability coding system when data exchange between systems. It is almost never used however, natively within systems and this is a problem. Radiology reports are narrative texts with varying structure. There's no standard coding for radiology reports at this point. Pathology reports are also narrative texts with structure. There's often very limited use of standard coding such a SNOWMED-CT. Even though, there's almost a 100 year history of SNOWMED, it's almost never used in pathology and most pathology reports are not really structured. There is an effort by the College of American Pathology to reward pathologists for using what's called, synoptic reporting. This has been changed to a notion of checklists where specific findings are put into the report or checkoff lists. But the adoption of this has been rather slow and few information systems take advantage of it yet. Vital sign data is often entered by hand with varying units of measure which makes it very difficult to compare. Physical exam findings are also narrative texts. There's no standard format and they're often highly variable. Now let's review the important topic of problem lists. Problem list and electronic health records systems are common and these are often based on the problem oriented medical record advocated by Larry Weed. These are typically encoded with ICD codes. It won't have all the clinical problems we might see in routine care. For example, chest pain and abdominal pain. If it's not a disease, you won't generally see it in ICD, so you won't be able to express these concepts well on a problem list and have it encoded reliably. In addition, often problem lists can be changed. They can be edited by the end-user which may completely change the meaning. A provider may select an ICD code for asthma and then they type asthma COPD which are two separate problems. Thus, these concepts should be documented separately. This image represents the Health Information Management System Society or HIMSS organizations electronic health record adoption model. There are seven stages, eight if you include zero. If there is no electronic health record and the organization is a 100 percent based on paper, then the organization is rated to [inaudible] or stage zero. Stage one allows a few electronic sources of information all the way up to state six and seven which involves complete electronic medical records with no paper in the clinic or the hospital. All of the continued care document transactions are electronic in the ideal level seven model and data warehousing is in place for analytics. There has been a dramatic push from the federal government to encourage the adoption of electronical medical records. By the end of 2017, only 6.4 percent of hospitals in the US had achieved stage seven of the HIMSS EHR adoption model, even after years of meaningful used standards, pushing organizations towards adoption. Even lower adoption levels towards stage two and stage three, do produce a high degree of clinical data available electronically so you won't need to be in stage seven to do meaningful reuse of clinical data. Clinical data typically exists to document patient care and of course to justify payments but EHRs are not typically created with healthcare analytics in mind. This is changing, meaningful use is pushing for standardization of data across most organizations. Many organizations are heading towards HIMSS analytic stage seven. But all health care systems are different. Even within an organization, misinterpretations of data happen frequently. Across multiple organizations, this is even a bigger challenge. As I mentioned earlier, understanding data elements depends largely on the context in which the data were created. There are specific challenges with the electronic health care record clinical narrative. EHR clinical notes are often subjected to what we call cut and paste by clinicians. This causes several problems. It perpetuates out of date information, it is not clear if the author is really reflecting upon current patient events. Moreover, it leads to less independent thinking about the case and possibly to poor reliability. For example, data might get copied that is not relevant for that particular case. They might have a previous lab values that have been automatically inserted. Often, this auto insertion of data contributes to poor reliability for really no practical reason. The same data is in another parts of electronic medical records system. The benefit of auto inserting a lump of lab data in the middle of a clinical note really begs for an explanation. It [inaudible] often only causes confusion. But this however is a common practice today, especially in training environments. Of course, this leads to problems for analysts who were tasked with extracting and aggregating these data. Well, if clinical data we're not complex enough for you, let's move into the next lesson on administrative data. These data have their own histories and sources and are often just as tricky to work with.