[MUSIC] Hi I'm John Kim from Sungkyunkwan University. From now on let's take a look at decision tree with three videos and this is the first video. Today's content is about the basic idea of a decision tree, and I want to explain about the entropy, and selecting attributes. And we've always look at this data tables at the library and classfier. It's the same data table, and here is the four attribute as an input, and the result would be one column. He played tennis or not. For example if weather condition is sunny, hot, high, wind and then he didn't play tennis. In the other case if the weather condition is sunny, hot, high, strong, he didn't play the tennis too. And with this tennis playing record, we want to predict the he'll play tennis or not. And if he had a weather condition with sunny, mild, high, strong based on this data set he will play tennis or not is our proposed to solve it. If we can extract some rules or patterns from the data set, it would be possible to predict his playing tennis or not. And let me give some more simpler version of this. And this is the data sample numbers and here is the three different attribute and output is playing tennis or not. And T means true, F means false. In case heap attribute is F, he will play tennis or not? If attribute B is true, he will play tennis or not? And the attributes C equals to true, he will play tennis or not. We can change the case like this. And with these three same tables we can just choose one attribute to estimate the play tennis or not. How about this one? We first choose the attribute one. If attribute A is true, then play tennis true in two cases, and if attribute is not true, the play tennis result is true and false, false. If we select the second attribute B, if the attribute B is true then the result would be true, true, true and else case the result would be false and false. And how about selecting the solved one? If the attributes C is equal to true the play is it true or false. And if it is not, the result would be true, true and false. So which attribute is proper to predict the play result? Yes, the attribute B is very good attribute to estimating, because attribute B true then play true and when the attribute B it false the play is false. So how do you measure which attribute is better for making judgment or estimating? So let me introduce about the entropy. It's a good way to measure the decision. Entropy can be set degree of disorder. And here is 4 different sets from 1 to 4. And set 1 and set 2 had a homogeneity, as you can see in this slide. And set 3 had more disorder. So it had the higher entropy than others. So the entropy can be formalized as a function by measuring the degree of disorder. And here are nine different sets on here and we just change it just one bit. And the set 5 has the highest entropy, by the definition. And set 1 and set 9 has the same amount of entropy, it's the lowest. And here is the entropy formalization, we can get the entropy value from set with this formula. And P1 is the probability that 1 appears in the set and P0 means the probability of that 0 appears in the set. And if we had set like A, we can calculate the entropy of A with this formula, and the result is 0.954. And the entropy makes the graph like this as the probability of positive variation. And if set From C1 to Cn we can calculate the entropy with this formalization. So let me introduce about the average of partition. It is based on the entropy, and if we partition the original data set and what would be the average partition of entropy. Here is the simple example we have the original data set like this, and P1 makes the partition like this and P2 makes different partitioning like this. We can calculate each of your entropies. And as you can see in this slide the P2 makes low entropy. We can calculate it details like this partitioning 1 in case of partitioning 2. We can calculate the entropy of each partition set like this and then we're averaging with weight factor. So, in case of partitioning one it has the partition average partitioning with 0.95 and in case the partitioning tool, the result is 0.45. And let's go back to the simple law data set like this. And our question is, which attribute is good for decision or play? If attribute A is true there is A is true play true and Gerald plays false. And we have these results based on the table. And we can use the entropy, so which attribute partitions play with the lowest entropy? We can calculate based on the entropy and making the average of partitioning. And with this one, we can get the result of 0.97 before the partitioning, and which attribute petitions play with the lowest entropy? We can calculate by with the details and as the result the attribute B makes the lowest average et partition based on the on entropy. So the attributes P is good for making decision, each display or not. And let's go back to the tennis playing record originally. Here are 14 data samples and for a tribute like this. And we have 9 yes and 5 no for the play tennis. And the entropy is 0.940. If we select the humidity as a splitting attribute, we have high and normal with this situation. And we can calculate the average of partitioning with this entropy 8.789. And how about changing the attribute instead of humidity? Before I said about the other attribute we can get the gain from humidity. The gain can be calculated from original entropy and average partitioning entropy and then we can get again as a result 0.151. So I can say the gain can be calculated from original entropy and then from the average or partitioning result. Instead of using humidity we can use temperature to distinguish the play or not. The average partitioning is 0.911. So the gain of the temperature is 0 to 9. And as we've seen the humidity has the gain 0.151 and in case of outlook it has the gain with 0.246. And the lastly, wind has the gain or 0.048. And with all these four attributes the outlook has the highest gain. It means it decreased the average partitioning entropy so it would be the best attribute to distinguishing the playing tennis or not. So if you select outlook as the root condition and then we can divide it with the several like this and we still don't know about these two cases. So we can to responsibly apply the same procedure on these two cases. I want to summarize this video with this flight. I've said about entropy. Entropy is defined with this formalization and if entropy is high we had a confusionist with the set. And also I introduced about the game about gain each for the attribute. If I attribute is proper to distinguishing the result, the game has a high value. So we can use to select the attribute with gain based on the entropy. Thank you. [MUSIC]