Patients often visit the same health center or health system many times in their life and health data continues to be added to their EHR over time. From a data perspective, this means that there may be multiple instances or measurements of the same data element within a single patient record. To date, we have been essentially identifying ever never cases. Did the patient ever have this code or that lab measurement? This is useful, but there are a number of more advanced ways that we can use multiple measurements to our advantage in clinical data science. This video, will introduce you to methods for handling these repeated measurements. There are three primary types of data manipulations you can consider for any single datatype. Frequency, or how many times a data element occurred, temporal, where you consider the timing of the data element, and finally value, or how you mathematically manipulate measurement values. Let's tackle frequency manipulations. Especially when a patient has a chronic condition, they make up the same billing code, laboratory test or medication multiple times in their medical record. The easiest types of manipulations we can consider are just looking at the count of how many times a particular data element has occurred in their record. There are three types of frequency manipulations you should know. First, you can use the raw count of how many times a data element occurred. For instance, how many times do they get a particular ICD code. This is an extremely popular data manipulation technique in machine learning algorithms. However, one limitation to this technique is that it can bias algorithms towards those patients who have more severe disease or simply get more frequent medical care. Because of this limitation of counts, you could instead use the second frequency manipulation, weighted frequency. In this manipulation, you use the number of times an ICD code occurs and weight that value based on the total number of ICD codes or total number of encounters the patient has had. The denominator you choose for your weighting affects the interpretation of this manipulation. If you weight by the total number of ICD codes, you are weighting both on the frequency and complexity of their medical state. Simply weighting by visit count, adjusts for the total amount of care they receive. The third frequency manipulation is thresholding. With the threshold manipulation, you require any particular data element to be present at least some number of times. We use thresholding frequently in computational phenotyping where we're only trying to determine the presence of a disease or outcome. By requiring some minimum number of billing codes, we are reducing the likelihood of a false positive due to an error or the diagnostic process. Comparing these three frequency manipulations, counts assume a linear increase in importance as more data is accumulated. Weighted frequency tries to adjust for this frequency of how often the patient uses the healthcare system. Well thresholding assumes limited value from multiple measurements after a minimum frequency of that measurement has been reached. Frequency manipulations are most frequently applied to billing diagnosis data, but could be applied to any of the data types available in an EHR. In addition to the frequency of the value, you can also consider the timing of the data element or when the data element occurred. This is done with temporal manipulations. If you have multiple measurements of let's say a lab value or a clinical observation, it can be difficult to determine which value you should use in your algorithm. Temporal manipulations are fairly straightforward. You could use the first value or the last value in the record or you might consider the timing of the event in relation to some other data element. Let's say you were looking at medication side effects. Perhaps you take the first lab value after the first prescription for that medication. Temporal manipulations require you to think carefully about which measurement is the most meaningful as you are only selecting a single instance from the whole record. However, it's possible that more than one measurement may be meaningful for your algorithm. In this case, you would want to consider value manipulations. Value manipulations allow you to consider the outcomes of multiple data points ever time. Like with frequency manipulations, there are a number of value manipulations you can choose from. First, there are traditional mathematical manipulations. When you have many measurements of the same laboratory tests or clinical observation, you could simply take the minimum or maximum value or you could calculate the mean of the values. But beware, mean is easily bias by very extreme measurements. So, you'll often instead see data scientists use the median measurement to reduce the impact of these extreme values. Depending on the data type, mode, or the most frequent value seen across all measurements, may make sense, but I really haven't seen it used a lot in practice. You may also use value thresholding. In this case, if any of the measurements are over or under a certain value or maybe fall outside a reference range, then you consider them meaningful. Just like with frequency thresholds, you can make this more strict by requiring multiple measurements meet this threshold criteria. Of course these three types of manipulations can be used in all permutations and combinations. An example of this is for drug side effects. You may take any lab measurement over a certain threshold that occurs in one year after the first medication prescription. This algorithm combines both temporal and threshold value manipulations. Now that you have these manipulation techniques in your toolbox, let's take a closer look at which techniques are appropriate for each of the data types you will see as a data scientist. First up our billing data. Although ICD-9 codes are numeric, diagnosis and procedure codes values should not be manipulated with value manipulations as they are categorical assignments. Frequency and temporal manipulations, however, are frequently used with this datatype. Procedure codes are usually used as just ever never in computational phenotyping, but for more complex conditions, you may apply temporal limits or frequency thresholds. Diagnosis codes almost always include some type of frequency threshold in computational phenotyping algorithms. By requiring two or more occurrences of a single ICD code, your algorithm should have a sufficient balance of sensitivity and specificity for preliminary or hypothesis generating analyses. Using the specific counts or weighted frequency of diagnosis codes is rarely used in computational phenotyping algorithms, but it is pretty common in machine learning algorithms. Temporal selection of codes is typically only used for more complex phenotypes where you are looking for a particular sequences of events. For laboratory data and clinical observations, you may see any of the datatype manipulations we described. But frequency-based manipulations are rarely used in practice outside a machine learning algorithms. Value manipulations are the most common manipulation used for these data with a fairly equal balance between mathematical value threshold manipulations. Temporal manipulations, especially those that relate lab measurements or observations to other data types, are also relatively common. Medication data is most often used as an ever never variable. However, you may see some algorithms or analyses that look at specific doses of a drug. In this case, temporal and value manipulations may be used to determine the likely dose at a particular time of the record. You may also see frequency manipulations of prescription or medication dispensing records to try to assess whether the patient is routinely taking a medication as prescribed. Finally, for keyword searches of clinical notes, most of the time this is also an ever never variable. Frequency and temporal manipulations may be applied, but this is rare in practice. That was a lot of content. You may be wondering how to put all of these manipulations into practice in your analytics. We will have a reading that shows you code samples for performing these data manipulations. Just remember that the most important thing in data science is to pick your manipulations based on your analytic question not necessarily the easiest to implement.