First, let's start with electronic health record data. So the adoption of EHR systems among US hospitals are getting close to 100%. And this is already dated informations, as you can see, from 2007 to 2016 or 15, and adoption of EHR systems is already going up to be above 90%. And these two curves just kind of captures two different type of EHR, right? One is just the red curve, is just very basic functionality, and then the green curve are the certified EHR. They captures some important functionality that's required by the certification process. And today, the adoption of EHR system is more or less 100% in the US. So what are the information captured in the EHR data? So the EHR data is really a computerized version of a paper chart that provide us documents about the patient encounters. And it should provide a digital record that covers comprehensive health information about patients. For example, it captures demographic information about patient, medications they're taking, and doctor's notes, and the continuous monitoring data, or different medical code such as diagnosis code, medication code, and so on, and medical images. So if you think about EHR data is, usually, you should think about this longitudinal version of data. For example, a patient had a four visit, v1, v2, v3, and v4, and depending on the different type of visit, they can be either point events like v1, v3, and v4, that happen on one specific time, t1, t3, and t4, or it can be a interval event. For example, v2 start in t2a and it finished at t2b. So why there are these different type of visitors in EHR data? First, for the point events, they are the outpatient visit or outpatient encounters. They capture a very simple clinic visit. For example, patient comes for an annual checkup, or they had a small condition that require getting some prescriptions, right? So that's a very simple visit that happen on a given day. And in those kind of outpatient visit, it will capture information about the symptoms that patient had, the diagnosis doctor gave, all the lab tests are taken, the medication prescription are given to the patients. And there are also these interval event, that's inpatient visit or encounters. Those are more serious visits and involve admission process and discharge process. So in this inpatient visit, it contains all the information that already documented in the outpatient visit, but it also has some additional information such as the admission times, what are the chief complaint the patient had when they first get to the hospital? Then during the visit, during this inpatient visit, various of monitoring data are captured such as ECG, EEG data, or some imaging tests are done in this timeframe. Then finally, the patient gets better and they are discharged, and at the discharge time, they will have a kind of a discharge summary that summarized everything happened during this entire visit. So if we think about this longitudinal EHR, we should think about this in a hierarchical structure. So on the top we have a patient, then the next level we have different visit over time. In this case, we Before I mean multiple visits e1, e2 to et and each visit will have a number of symptoms or diagnosis that were given to this patient, and those diagnosis will lead to different medications or procedures, that's the next level. So as you can see, there are multiple level of information captured in this longitudinal EHR data, so this course is about deep learning for health care. A lot of this process will involve analyzing various kind of patient data or house care data, in this case EHR data, we want to convert this EHR data. Into some kind of embedding, so embedding I really a vector representation of patients or about visitors, or about the specific medical code. And once we have those embeddings, we can use those embedding to support some tasks, such as predictive models or generating other type of data, data augmentation task, clustering task and so on. So deep learning is a process of convert original raw EHR data into this embedding vectors of whatever you want to study. And dense using those embedding vectors we can do the final task much easier. So what are the properties of EHR data? So here are the pros and cons for the. Pros it's a very rich data, it contains both structured data and unstructured data. And it's longitudinal, so you not only observe one time point of what patient look like, you can actually see multiple visit over times, so this temporal evolution is also important that captured in EHR data and what are the limitations? It's complex because it contains a multi-modality of data, have the structured code, numerical data, time series data, images, clinical notes which are text, so it's very complex data set. And its a siloed data so if each hospital has its own version of EHR systems and they're not really linked across hospitals. So if you see patient data from one hospital is really just a partial view of what really happened to those patients and if the patient visit some other hospitals or clinics and you will not know easily by looking at data, EHR data from one hospital. And the data is very sensitive because of the privacy concern, because it's documented very important and sensitive information about individuals, so it's difficult to share or even gain access for research. And one important type of data in electronic health records is the clinical notes. And it's really a free text documentation of patient status that are provided by providers such as doctors or nurses. There are various types of notes depending on who does that and at what steps of the health care services you have a different type of notes, such as admission note and ED notes, stand for emergency department visit, progress notes. And nursing notes, radiology notes, and other type of imaging report like ECG note, echocardiogram note, position note, discharge summary. Respiratory notes, and social workers notes, right? So all this contains different information about patients and they also follow different formats. For example, a progress note is one of the most popular type of notes. And it often follows this structures, called SOAP, stands for subjective objective assessment and plan. So there are four sections, so the subjective section is what patient tells you. The patient may complain about fevers or head. And so that's the subjective part, then the objective part is what you exam. You can do some lab tests or some other findings you'll objectively identify, so those are in the objective section. Then you have your assessment, that's the section you provide the diagnosis. And then the plan, the plan is really about treatment, what should the patient do? Should they take some medications or doing some procedures? So that's the plan and so this is just one type of notes for the progress note. For a different type of notes you have a different template, so the properties of clinical notes and here is the pros and cons. The advantage of this type of data is is very detailed, it describes all kinds of information about patient status, and it's a universal. Whenever you talk about, year chart data, clinical notes are important part of all clinical encounters. And it's also very flexible because you can put various information into this free text format such as symptom, lab tests or even whatever patient provide to you or whatever the doctor want to document in these notes. So the limitations or disadvantage of this type of data is it's unstructured again. So it's a free text narratives which can be difficult to process, especially for machines. That's why clinical natural language processing is an important research area. It's noisy, because doctor or nurses are under a lot of pressure, time pressure to produce those documents, so the documentation quality can vary. A lot of acronyms are used, typos happened, they're really not a well written articles, it's more like notes for doctors or nurses to remember what happened during the encounter for this patient. It's again more sensitive, because the free text data can contain everything, especially those protective health information such as patients' age, social security number, addresses. And so anything can happen and can be documented in this clinical notes, that's why it's also very sensitive.