[MUSIC] Hello, in this module we will talk about the common clinical data sources and how data is being captured in them. As well as talking about all of the challenges involved with this new big data initiative and the population-level data that you will face with it. In this video, we'll talk about insurance claims and electronic health records. The key takeaway of this video is being able to describe the advantages and disadvantages of both the insurance claims and also EHRs or electronic health records. So, there are some advantages and disadvantages to each of them, both the insurance claims and EHRs. For insurance claims, the advantages are the fact that they cover a broad scope of data across all providers for a patient. So regardless of where the patient went, as far as the procedure was claimable, all of the diagnostics and medication records will show up in claims. Claims because of its tight rules of how it's being reported, usually has a higher consistency across different sources. Basically they all use something called a unified forum. And also because there are large insurance companies here in the US, those insurance companies also have large scale claims data bases that cover a large population of interest. So if you want millions and millions of patients, maybe an insurance company has all of their claims. But usually, health systems that have EHRs may not have such large coverage. Now, there are also disadvantages with insurance data. And the obvious one is that it only includes demographics, diagnostics and medications. So it misses a lot of other things like lab tests, vital signs, problem list, family history, a lot of other things that do exist in the EHRs. Now, insurance data is also limited only to insured patients. So for patients who do not have insurance or can't afford having an insurance, no data exists in claims data. And also the non-covered entities, so if something is rejected and it's not covered under the insurance, would never show up in claims data. Now insurance data could be from multiple sources, but basically it boils down into whether it's in a hospital setting, is it in an outpatient setting, or is it medications. So sort of three broad category is here. And you can see here, the hospital claims usually referred to something called facility claims, that could be from the inpatient side of a hospital setting, which is sort of when you're admitted to the hospital setting. Or the outpatient facilities associated with that setting, such as the emergency room, or ambulatory surgeries, or other things within the hospital setting. In order to report an insurance claim for the facilities, usually a unified form is used, called the Unified Billing Form 04 or UB04. There are also other names, as you can see CMSI has its own name. But basically, this was published by the National Uniform Billing Committee and almost all providers use it these days. That's how they send all of their billing data to an insurer to be reimbursed for services they have provided. But remember, although I will show you in the next slides the forms, almost every provider these days does it electronically, so really that form doesn't exist anymore, nobody writes on it. Everything is computerized. But the fields that are reported are exactly the same fields. Here is one of those forms that I just talked about, about sort of reporting the facility charges. You can see it has multiple sections and then a database or in a database that it's being transferred for billing purposes to the insurance company, they will receive all of these data sources such as provider information, discharge information, who was the patient? All of the different revenue codes which are basically diagnostics and procedural codes and the rest of the codes that you can see on screen. Here is the list of the data elements in the UB04 form that you can expect also to see in a claims database. I don't want to go through all of the list, but you can study this later. Now, the second type of insurance data is called the professional claims, or sometimes referred to as the physician claims. You need to remember that Medical care is not only given by physicians, it's also given by nurse practitioners, respiratory therapists, physical therapists and so on. So the correct name should be professional claims. And they also use a very specific form to report it. You can see the Center for Medicare and Medicaid Services, or CMS, uses a very specific form for receiving this information and being able to reimburse the professional health care providers for services they have provided. So and it is important to note with respect to the hospital admissions, there might be two claims. One is the facility or the hospital claims that we talked and then the other one on top of it is the professional services. because physicians might have their own procedures that they perform and they want to have their own reimbursements. So sometimes you see it all mixed together in an insurance database, so you need to be able to find out what type of charge made for a given encounter. Here is the form for the physician insurance claims. You can see, again, the patient's information is there. The payer's information is there. The procedures that were performed and also diagnostic codes. Here is the detailed list of variables used to claim a physician procedure or diagnosis. You can see there is a long list of variables. You need also to make sure to look at the form and there are some sort of legacy limitations to the form including that not more than 12 diagnostic quotes could be reported for an encounter that has actually stayed even in the digital format of all these forms. And there might be limitations, although usually those limitation doesn't change much your research outcomes or research, but you should be aware of limitations that might be imposed because of those structures for claims databases. Claims might also include pharmacy data or basically medications. And it's very transactional because in a given time, a lot of medications are being filed. And when you're in the pharmacy, they run it and then they get it and it's very fast and then they dispense the medication. It's very different than medical claims. There are also special forms that they use, but almost everything is now electronic and for a lot of the drugs, the NDC code or the National Drug Code is used for reimbursement purposes. So if you look at these databases, almost everything is NDC because NDC codes also specify the package and the number of medications in that package, which relates to how much you need to pay. And NDC codes are actually updated almost every week by the Food and Drug Administration here in the US. The Rx claim may also include some descriptive information, but that's usually not there, that there are some data quality issues with that. And remember, if a medication is prescribed but the patient doesn't use his or her insurance to claim it, then it never shows up in these databases. So if somebody pays out-of-pocket, or if it's an over-the-counter medication, usually you don't see it in claims databases. Here are also the forms to report medications, but almost nobody uses them. It's just good to see that these forms did exist, but now almost every medication claim is done through certain electronic infrastructure here in the US. In the process of the pharmacy claims, there are middle entities called the Pharmacy Benefit Management companies, that are basically process the data before it goes to the main insurer. They’re called the PBMs and they often have a very large data on a large population for their medications. So you can see that I've listed some of them here. They actually provide services to a long list of insurance companies and hence, they have population-level data for medication dispensed out here in the US. Now there are also non-claimed sources of medication data such as EHRs. In previous videos I've talked about Surescript which is a network for all of the e-prescribing in the US and also PDMP or Prescription Drug Monitoring Programs that are central databases in each state, that tries to monitor controlled medications because of the possibility of abusing them. Now we talked about the insurance claims, both the hospital outpatient and also the medication format of them, but there is also EHR data. And because EHRs are widely adopted here in the US, we do have population-level data for EHRs. There are of course, advantages and disadvantages. The advantage of EHR data is that it includes a lot of data about the patient and it goes beyond just demographics or diagnostics and medications. It includes there values, vital signs, social data, problem list, a lot of clinical notes, free-text about the patient. So you have much deeper information about the patient and what happened in that encounter. Now, it also includes procedures and medications that are not covered by insurance. And so, if somebody who doesn't even have insurance shows up in a medical office, still the data is in the EHR, although there is no insurance to collect that data. And for example, for medications, a physician might order an over-the-counter medication, you go and pay out-of-pocket to get over-the-counter medication, or OTC and they never show up in your claims. So you're obviously to find out what are the differences between claims and EHR, in order to better work with that data. So as I said, EHR data also includes both insured and uninsured patients. Now there are disadvantages. And the most important one is one thing called EHR leakage. That means the data that is collected at the point of care about that patient is only for that point of care and that encounter. If the patient goes somewhere else to another provider and if their EHR is not connected to the EHR of this provider, then that data always stays silo in each of the EHRs and that means you may not know about things that happen to the patient elsewhere. While on the claims side, if all of these visits or encounters were claimable, then everything shows up in the claims database. So that's the problem with EHR, it's called EHR leakage. And because of this and because of all of the EHRs having a bit of a different structure, there's also a lower consistency across different data providers, especially when we get into issues of free-text and so on. But the last item might be also, which is changing, is that large-scale EHR data sets doesn't exist yet, but there are some providers and there are some activities to make EHRs interoperable and having a very large database that it's basically fed by EHR data for a large population. So it's sort of, I would say in ten years, we might actually have reliable population-level EHR data sets as well. Now, the Meaningful Use and the HITECH Act here in the US funded a lot of EHR adoption across all providers and that helped to increase the number of providers that are using EHRs. But it also, along with it, brought in certain criteria on what that EHR needs to have and how you should use it. For example, there are some minimal requirements for the data types that should be recorded in these EHRs, such as patient demographics, medical history, problem list, diagnosis, procedures, allergies and medications. Which makes it a bit consistent, but still there are a lot of nuances if you go into the details. There are variations in data quality, especially when you go from one EHR to another. And there are limitations in terms of information that may not exist in one EHR versus another one, depending on what practice one provider has versus another one. There are the common population-level health data types that do exist in EHRs. This is not an exhaustive list, just a sample list here. Probably the most common ones used for population health research is demographics, diagnosis, problem lists, procedures, prescriptions, family and social histories, vital signs, immunizations, allergies. And on an adhoc basis, some surveys or automated data or free-text as needed. There are also nontraditional data sources of population-level health, however again, depends on which state you are in or which country and depending on that, you may have this data or not. There might be some patient-provided or generated data sources like all of these smart phones generating all of that data and having it all on one place, maybe. There are a lot of public health or vital record data, social services data, environmental data, or geographical data sources. Or other data sources as you can see on this list, that might be available on a population-level that you can sort of query these large databases to find certain trends as needed. So in summary, we talked about the common data sources, specifically insurance claims and EHRs. And also touched on some non-traditional data sources. Thank you.