Hello, again. This is Ernestina, you remember me from the first lessons. You have been analyzing in all these lessons along this course that you have now many, many lessons, just how to deal with a non-structured data. But now the thing is, once we have this structured data, what can we do with them to extract patterns from them? And this is what we are going to cover in the following lessons. That we are going to cover the process of data mining and the techniques of data mining. So data mining is a term that was coined in the '80s. And it was being used in a conference in which Piatesky defined the process of knowledge discovery in databases. And he just termed this coin. And he just defined the process of knowledge discovery in databases as the non-trivial process. And make emphasis in the non-trivial, because we will see that this is something that is very important. So once again, the non-trivial process to find patterns that has to be novel, useful, understable in data. And this is our major goal. We want to find patterns in our medical data that has to be useful. Because remember that our main goal is that we have many opportunities, many challenges in the medical domain, and we want to improve all these costs, we want to improve the healthcare sector in general. So in the history, we have been naming the process of extracting patterns from data with many, many names. In fact, we started calling it data fishing in the pattern which we were using a lot of statistics in the early '60s. Then we started as I have mentioned in the '80s, just calling it knowledge discovery or data mining. And then some people even today think that that term is outdated because now we have other terms like data science, big data, data analytics. Or in the late '90s, and early starting of the century, that people started talking about data science. In the Machine Learning community, they were talking more about supervised methods, unsupervised methods. But the thing is, whatever the name that we may use for all these processes, what we have to make clear is that we are just focusing in one V of the big data, that is, extracting value. Because extracting value is extracting patterns from the data that we have collected. And this is the major aim of whatever you call it, data mining, data dredging, big data, data science. Okay, so this is the main thing that I want you just to have into account. So the thing is, once that we know what that we want to take is to extract the patterns hidden in data, which are the main kind of patterns that we can extract or which are the main problems that we can find when we want to extract these patterns? And even that we can have many, many problems, business problems in the healthcare domain or in any other domain that we may think about, the good news that I have for you today, is that we have two major objectives. First of all, we have patterns that will be for description. So, the main goal behind these patterns will be just to describe the population. If we have a population of patients, then we will be having patterns that are going to describe the population. And describing the population, we are going to distinguish between two kinds of patterns. These patterns that in which we try to find groups in population, and this is what we are going to call clustering. Or those patterns in which what we try is to find associations between the variables that describe the patients, or describe the treatments, or describe any other thing that we have in our database. And this in which we find association, we call association pattern mining. Okay, so the major objective, one of them is description and the second one is prediction. In the prediction objective, what we are looking for is a pattern that describes and is able to predict one variable of the ones that we have in our models. Okay, so we are going to have a database, a table for example, in which we have information of patient for example. And which one of the variables that we have for the patient, let's say, that is, if the patient has developed certain toxicity after treatment. Okay, and what we would we like is just to extract a model that is able to predict in a new patient in which we still don't know if we develop the toxicity or not, whether he will be prone to take the toxicity or not. And we are going to do this by a prediction model. What I want you to take into account is that, even that we have descriptive patterns and predicted patterns, what I want you to say is that, data mining, all the tasks that we are going to do is in the induction. We are not going to reduce anything from the data, what we are doing is to analyze historical data that we have collected, and from there, we are going to extract patterns. So either it is a prediction model, or it is cluster, or association rules, everything will be taken by an inductive process. So we have these two major problems. And now you can see clearly on the slide, just for you to have it very clear that we have predictions. And in predictions, we are going to distinguish when the variable that we want to predict is a class, is just a discrete value, and then we call it classification, or if it is just a numerical value, then we call it value prediction. And in description as we have seen at the very beginning, we are going to distinguish between clustering and association rules. And then for each of these kind of problems, we are going to have different techniques that these would that we'll start talking about them. We are going to go a little bit more deeper along the course. But in any case, is good because these names are the ones that you are going to find in the tools. So for example, for classification, we are going to have techniques such as decision trees, random forest, neural network, SVM. For value prediction we are going to have for example, regression. And then for clustering, we will have the K-Means algorithm, that is the most popular if you want. For clustering we are going to have Kohonen maps, we are going to have hierarchical dendogram clustering. And then we are going to have the most well known algorithm for association rules that is, apriori algorithm. An important goal that I want you just to remember when finishing this lesson, is that our goal here is to find the patterns. Once we find the patterns, the patterns will be used and will be deployed and will be used by professionals. Our goal is to extract the patterns not to use the patterns. The patterns will be used by the professionals as I am saying. And once they are using the patterns, then this is not data mining any longer, this is just deployment of the models. So we have to keep in mind that data mining, knowledge discovery, or whatever the name that we are calling is extracting patterns from data. And that we have two major goals, description and prediction. We will go a little bit more deeper in the following lessons. Thank you for your attention. I'm looking forward to see you in the next lessons. Here as always, I'll leave you the references and materials. Thank you.