In this lesson, we are going to study classification and prediction methods. These methods are part of a stage which is called model building in data analytics. However, data analytics is actually a process involving several steps as we see on this figure. So first step is business understanding or domain understanding, this is a step we studied in module one, when we looked at background information in biology and medicine and genetic. So this is a traditional step. The second step, data understanding, is where we're able to calculate means, central tendency in the different features and also visualize the data. So we've done some of that actually in the practical cases we've been looking at. And then the third step is data preprocessing which we looked at particularly in module two, but also in module three when we did feature selection, which is often considered data preprocessing or data preparation step. Now we are advancing, and we are going into model building, and on different types of models. And we are going to start in this particular lesson, on classification and prediction models. And then there is a stage called training, testing and evaluation, where we fine tune the model until it becomes satisfactory to use in deployment. For example, for us it will be used in clinical practice, or use it to build some kind of tool to use for clinicians or clinical research. So at this stage we're going to build models for prediction tasks, and we're starting by an overview of classification prediction methods. Because there are many of these and new ones are created everyday, so I'm going to mostly give a basis of the main categories, and of course understand that this evolves everyday. Classification and prediction are the most frequently used task in data analytics. Everybody wants to predict the future, everybody wants to for example, diagnose a patient very precisely, and very accurately. Diagnosis is a typical prediction task in terms of the analytics. So, this definitely are some of the very most important models to build in data analytics. And so, the methods not only are varied, but they often can be more closely linked to or originate in a particular domain. So for example, you have some methods more from pattern recognition when you want to understand for example images and videos, it's more pattern recognition we do there. There is also machine learning coming from artificial intelligence, for example when you look at decision trees, these are methods originated in machine learning. Artificial intelligence in your networks as well. So when we talk about deep learning, this is also from this area. Databases also have given a lot of very interesting methods, particularly they are very good at making more efficient methods, like a scaling them up, and mathematical modeling and statistical modeling is also extremely important, particularly in biomedical domains. So the big difference there is between classification and prediction from a specialist standpoint is to say, classification predicts a categorical valued feature, also called a class, so for example digital classification task is diagnosis. You know in diagnosis doctor wants to predict or find out what is the actual disease category of a patient. Whether this is breast cancer or whether this is skin cancer. These are all types of diseases. And so that's for classification task, although of course it's in this big umbrella of kind of family of prediction methods. And we refer to prediction when we predict a numerical valued feature. So for example, we want to predict the survival length of a patient, or a risk index that would be numeric. Then we would talk about prediction task. So, classification and prediction tasks all are going to build some models. And the models will describe and distinguish classes or concepts for future prediction. For example, as I said, diagnosing a disease is a typical classification task. And evaluating the risk or severity of a disease in a patient, is a typical prediction task. What differentiates between the different methods we're going to see is a type of algorithm I often call algorithms methods in this course. So we also said is some kind of similar to a recipe, or the instructions that you receive when you want to build a piece of furniture, for example. So it's an algorithm or a list of instruction of steps. And it's also, I often call that here a method. So, these are the methods that differentiate between the different classification and prediction methods. And some are based on analogies, making analogies between previous cases and new cases, rules, neural networks, some use probabilities or statistics. So these are the main types of classification and prediction methods, that we're going to see in this particular lesson. The model building actually involves using a dataset for training, then another one for testing. So we start here from a set preprocessed data, and suppose that we have only one dataset, very often what we do is separate the dataset into a training set, and a testing set. For example, we can take two-third of 75 gives us 67%, often we go to 70% or 75% for training data, and one-third for testing data. The training data allows to really test the model, and we apply the algorithm or method on the training data. And we test it on the test data where we assess a model, and we evaluate a prediction accuracy. So it means that we use part of the data for training, part for testing, and we see that there are ways of doing that much more systematically and more deeply than that. Of course, it's always preferred when you have a separate training data and testing data. This is to avoid what we discussed, overfitting with the data. So if you have a completely independent testing set, there is much less overfitting involved, at least in the prediction accuracy. The end goal of course, is to put the model in production. Which is when it will be applied to any new data, for example, a new patient coming for a visit or online. In this step then generally, the preprocessed data that we've used for training and testing the model will be discarded, and we'll use completely new data. And that's why the overfitting question has to be addressed carefully. Thank you.