Next we're going to talk about classification methods based on rules. And this is where we also see one of the most important classification of decision methods, which is based on decision trees. Particularly for classification and we see why it is so important. Decision tree is built against systematic methods that I call an algorithm, and we see here some main steps in the instructions. One is to, it uses divide and conquer strategy. And it recursively divides the training set. And each division consists of examples from one class. So first it creates a root node, the root of the tree. And puts all the training data in it. Then it selects the best fitting attribute. This is where those are a different method for selecting this attribute. But it uses the attributes that are going to minimize the entropy, because it wants to use the attributes that is going to optimize the tree being the most compact. Add the branch to the root node for each value of the split. And split data into mutually exclusive subsets along the lines of the specific split. And repeat these steps for each and every leaf node until the stopping criteria is reached. So, that's an example, here, of a decision tree. I'm going to explain that, but that's, pretty much, what it looks like. There is a root, and then, at first, as a root, you have all the samples, those are training data. And then, each time is going to decide which attribute, I'm going to split on. Here for example is going to show cell size, because cell size is the one that's going to, it should split on cell size. This is where the tree is going to be the most compact. So it means the cell size is the most discriminative power among all the attributes at this level. And then one cell size Is decided about, and here particularly, even the Cellsize more or less than 2.5, then it uses exactly the same process in each of the subtrees here. So that's why it's going to, it's an algorithm called iterative. So it's going to repeat this process often, over again until the whole tree is built. So one of the main questions in this algorithm, like we saw before. One of the main questions, maybe what's the [INAUDIBLE] function or what's the optimal weights. So here, one of the questions is going to be what is the splitting criteria? So which variable to split first? What values to use to split? How many splits to form for each node. And again, we use the entropy measure. Entropy measures, as you know. Entropy is a measure of disorder. And so, if we wanted a tree that is compact We want as little disorder as possible. So that why entropy is a good measure for building this tree. Also another question, the stop criteria, when to stop building the tree. For example, I'm going to say well, I'm going to stop building the tree when all the instances have been. Are represented in a leaf but sometimes you never reach that so the question is when to stop the tree. For example we can decide at a certain depth. If I've not put all the instances at the certain depths of the tree. Certain level like level three or four we're going to decide to stop the tree. Pruning also there is a question or pruning of tree again [COUGH]. Here's system to avoid what we called over splitting again, over splitting is about a when the tree would be. So mapped into the training set that it will not generalize well on new data. So there is sometimes a pre-pruning and a post-pruning question. That's a very interesting and important consideration. Most popular decision tree algorithms include ID3, C4.5, C5, which is a part of the name category of decision tree learning CART, CHAID, and 5 as an example. So again, there's been quite a bit of research in this domain. So, this entry building process is very efficient for Big Data, and has the advantage of being easily understandable. That's a main, we can say advantage of these method is that, as you see it has a Visual graphical presentation as a result. So it's not a quot and quot black box like your networks or even as we're in a certain sense. So here for example on this tree where we want to classify your sample between two classes 2 and 4. For example, two could be malignant, sorry, four would be malignant and two would be non-malignant. So, we look at different cells, minimize the cell size, shape, nuclei, which is the nucleus of the cell. The many clumps as well In the cell so it's a clump sickness, chromatin or so would be a factor and the would be another feature. So based on these features. We want to classify and build this tree to classify a new sample. So for example if I have a sample with a cell size of 2.5, sorry, less than 2.5 here, I would go left branch of the tree. So for example a new sample of cell size 2 I would go to the left of the tree. Then, if the sample has a barenuclei less than 5.5, I would also go to the left. And then, if you'd had the clumpthickness less that 6.5, I would again follow to the left. And then when I arrive to the leaf node is when I perform the classification. And I say, well, in this particular leaf 99.5% of all the samples belong to class two, which is nonmalignant. Well 0.5% would be malignant. So based on again kind of a majority vote, I'm going to say well this particular sample can be classified as a class 2 non malignant with 99.5%likelihood. Or the strength of my certainly that this is a nonmalignant is 99.5% so also a probability that this is a right result. Which is a pretty good certainty. So of course the certainty is going to depend upon each leaf. It's going to be different in each leaf. Also what is interesting and why its a method based on rules is because once I have built a tree, I can create corresponding course. So for example I can say that, I can create, transform this into a rule. Have a rule that if Cell size is less than 2.5 and BareNuclei is less than 2.5 and the ClumpThickness is less than 2.5. Then my sample belongs to class two, so in addition to having graphical visual representation that is very understandable. I could creations like this type of presentation. I can also extract some rules and build very simple classification software based on the rules. So for example, again, this tree classifiers between two classes, to and fro. 2, being, for example, normal, non-malignant, and 4 being malignant. And we would make a classification based on data representing cell types in a tissue that is being analyzed. So, we see the attributes here of cell size, their nuclei, cell shape, clump thickness, you look at clumps in the cells, the chromatin or so and whether the nuclei are there or not. And suppose we want to classify a new sample and. To classify a new sample, I start the root of the tree and, Supposed at my sample and as a cell size less than 2.5, in this case I move to the left of the tree. Then I ask a question, what's a BareNuclei? So BareNuclei, I know, is 5.5. So I'm going to move again to the left at the tree, and then ask a question, what is a ClumpThickness? And I have 6.5. And then to leaf. And that to leaf I'm capable of providing a very likely diagnoses, because at this leaf I see that I have 99.5 percent of the samples in class two and 0.5 percent in class four. It also tells me that there are 416 samples there. So it's a good number. So the node which is again a leaf, would give me that for this particular sample the likelihood of being a nonmalignant, or class 2 would be 99.5%. It's also a probability that this is 99.5% which is a high probability. And again, it's also the likelihood, also the certainty that they have that this is 99.5%. So as you can see, one big advantage of this method, beside being quite, the trees are rather easy to build, so it doesn't require a lot of computer processing power. But it has its visual presentation and conditions like this type of a representation, because it can't understand when they want to make a decision, they know exactly why they're making the decision. And each step along the way the tree can ask a question and are capable of answering it. In addition, we can also extract the information represented in this tree. For example, here I could have the row associated to the leftmost branch would be if Cellsize is less than 2.5. And BareNuceli is less than 5,5. And ClumpThickness is less than 6.5. Then [INAUDIBLE] points to on that [INAUDIBLE]. So you could, for each branch of the tree, you can expect a role and you can then transform this force into a software that's going to do automatic classification. In a very compact way again, so this method has a lot of advantages. And again like decision trees and they're not the only ones. Thank you.