Hello again. Here, Alejandro Rodriguez, hi, with this new lesson about challenges and problems in biomedical text. So, as you remember the small kits about the foundation of mining non-structured medical data. So, in this lesson, I'm going to show you very quickly, some of the challenges and problems that are regarding extraction of information from biomedical text. As you know, and as I explained before, natural language processing, or the way that we write or speak to other expressions, or we write in a book or in a scientific paper, or in an electronic record, for example, easily the way that we communicate with others so, it's in natural descriptions. So, getting information from these natural descriptions is a hard task because it's not straightforward. We are not able to do a great deal like we saw in the example, for example, of the electronic records about the age, of date of birth, or something like that, because this is not a structured information. We have information in this context, in text while form, that it's not possible to be queried in a normal way. So, in spite of these particularity of not being able to get information from natural text, we have to take into account that we have to deal with the problem of several things. For example, we can get information from biomedical text that are related with information on proteins, genes or something like that, or maybe information about anatomical parts of the human body, or maybe information about clinical settings like drugs, or symptoms, and so on. So, we're going to see what are the main challenges that are in the extraction of information from biomedical texts. So, the first thing is that I would like to show you, a very small difference about the two fields that are related but, where text-mining field comes from, because text-mining as you see, comprise the discovery and extraction of knowledge from free text. And it's considered a branch of Information Extraction. That should not be confused with Information Retrieval. While information retrieval returns documents, information extraction returns facts, okay? So, that's the difference between both elements. It's important to clarify because they are different fields. So, first of all, I would like divide the explanation of this lesson in two parts. Clinical information or clinical text versus biomedical information or biomedical text. As we saw in the sentence that you have the original paper in the reference, a lot of information has been written about the biomedical uses of NLP. But, they also divide this into these two categories, because it's not the same extracting information for example, from biomedical texts that are normally information that leads to writing on articles, scientific papers, maybe books, and so on, or the information that leads to writing in clinical setting. That it's the information that leads to writing by clinicians, by physicians. So, the information in both parts is different also, because the kind of information it's different and the expressions that are used are different. So, we have to divide in both sections. So, starting with the biomedical text, I'm going to show you very quick some of the main challenges. The first one is the genotype mining. The idea of genotype mining is trying to recognize gene or protein names mentions in our text. So, the idea is that we are, for example, analyzing a paper that is describing some interaction between proteins, or maybe it's describing some proteins that are related with the specific genes that are associated to a specific disease, and so on. So, the idea is try to identify these proteins and these genes, and identify the corresponding databases where they are mentioned, for example, that's one of the possible options. The second one is the phenotype mining. In contrast to genotype, we can consider the phenotype as the phenotypical expression of a genotype. For example, imagine that we want to get information about some symptoms derive, or that are part of a genetic disease, okay? So, they're made up phenotypical manifestations. So, the idea is to recognize these observable characteristics of individual as the interaction of its genotype. This specific challenge is mainly based on dictionary-based methods. Pharmacological information. That's one of the main things that can be done in the context of extracting information from biomedical texts, and it's try to identify drugs or chemicals that are important in treating or causing phenotypes in the course of a treatment. So, the idea is that well, you know that the drug can be the commercial name of the drug that is issued by a specific pharmacological company. But, also we have the active ingredient or the exhibits that are part of this drug. So, the idea of this challenge is try to get this pharmacological information, trying to address both parts. Active ingredient and as well as the brand or the name of the brand of a specific drug. We have also the relationship between genotype and phenotype. It's basically, try to get this information where these two terms are related. Like, for example, the phenotype associated to a specific gene or so on, it's whatever. Okay. So, the idea is try to identify in specific sentences, where these two terms are associated, and in which degree, or what are the kind of association that are implying or are putting both together. We have the same in the context of drugs with genotypes and phenotype, I mean genotype-drug mining and phenotype and drug mining. In both cases, it's the same idea. Try to get the relationships that are between both elements, between the genotype and the drug, and the phenotype and the drug. In the context of the clinical text, it's a bit different because, the kind of information that normally is expressed in the clinical setting is not the same as the biomedical. Normally, that's not, I mean, that doesn't mean that it's unaccepted. But, normally, we are not talking about specific proteins or genes, and these kinds of relationships. We are talking about any other kind of information. So, in the clinical text, we have to focus on more specific things. That doesn't mean that we can't get this kind of information in a clinical setting, it can be, but, it's not mostly focused on this. So, kind of things that we are or we want to get in these clinical settings. For example, contextual information. Trying to identify information such negation, temporality or event subject identification. Trying to get, for example, that the report writes that the patient has no fever, or the patient has been with fever during the last few days. Information about negation, temporality and so on, or information in general. Within general, we refer to for example, findings, diagnosis, laboratory tests, names and results, and so on. All the kind of information that is pretty useful and pretty valuable. We also want to, for example, identify drugs, okay? That drugs is important to get, for example, if a specific patient needs some specific to treatment. And that's one of the other things that we have to deal with. Codes. It's important to identify the codes or the terms that are mapped to controlled sources. For example, the use of International Classification of Disease the ICD, it's very common that in some of the reports, instead of having the specific name of a disease, or of a specific treatment, or a specific procedure, we can have the code that is related with ICD. And the relationship, the relationship between any kind of concepts. For example, the mother of the patient has been diagnosed with breast cancer. So that we have the patient, and we have that the patient has a mother and the mother has been diagnosed with this specific type of cancer. This kind of information is also very useful to be extracted, and is one of the main challenges in clinical text extraction, sorry. So, these are some of the reference that has been used. So, I solely recommend you to take a look, because there are some survey papers that are really interesting, and they give more information about all of these challenges, as well as some frame or some techniques that has been used before. So, again, thank you very much, and see you in the next lesson.