[MUSIC] Hi in this video clip let me explain how to get descriptive statistics and how you create one hot vector. Let me explain later the concept of one hot vector or when I explain one hot vector. So far I have shown you a few cases of getting descriptive statistics. I'm providing more comprehensive example of getting descriptive statistics. Now at this time you are using ideas full data containing 100 observations for three kinds of ideas. So let me use this one dark described. So as you see here, Irish kind is ignored or 150 observations are used for calculating descriptive statistics. How do we know that? Because count number is 150. It means that the number of observations used for calculation of those descriptive statistics are 150 several length support with pedal length Pedowitz. Those are color names of Irish data set or so there's a target variable. Target variable had numerical later general 12. So in this case data frame contains five do medical variable and one string variable the label. But pandas automatically identify whether the data type of each variable years, numbers or strength. If the data type is string, the string papers are removed in calculating descriptive statistics. So you don't need to worry about which variable is string, which variable is numbers simply you use describe function then whenever only the variables with number or floating number or integer number. Those variables descriptive statistics actually created. Now we can calculate descriptive statistics with subset. So for example we use subset. Not this one. Here's I really make this command line alive and then execute this one. So we are taking only four columns. We don't want to get descriptive statistics for target because it is actually labor even though the information contained in target is in teacher that is actually labels. So if you want to calculate descriptive statistics for or four variables, you can slice and then apply described function and you get it. What about the other case? In this case you you are dropping target variable, you are drafting target variable and it is axis. So it means that target terrible is dropping so and then apply described function. Then what do you see the same exactly the same outcome? So you can drop target variable. Another way of calculating descriptive statistic is specifying the variable. You want to get descriptive statistics for cepal length. That case you specifies available and then apply described function and execute the cell, then the SS here. Another way group by this is a very useful tool. There are three groups in Irish status, Sentosa vs Cola and Virgin Inca. So based on label you can calculate describe descriptive statistics. In this case there are four variables and three kinds tosa versus kayla brittanica. So it is holy gently long data frame. So not all data statistics are presented here. Some are not presented here. So you want to see vertically. So can we have a better presentation of this outcome? Yes we can have apply this one. First, you choose the very worth that you want to calculate and then grow apply group by function and then describe function. Actually this one character is the same as the above one at the end. You are transposing. It means that this outcome is transposed. So the outcome is this one. So sentosa, versicular, virginica becomes column label, column variable. And then for each variable there's a sample length. Simple ways for now the column becomes roll index and then descriptive status is calculated and presented. So obviously this is better then this one so you can simply transpose. But at this time this command line is similar to slightly different from the previous one but basically the same. So what if in this case what if you use transport? What happens exactly the same mark? So apply group by and then describe if you want to see politically the outcome you apply transpose. Now, also after drafting target the rule. If you want to calculate co variance matrix, you'll get this one between sample lengths and several lengths. This is the co variance. If you want to get correlation table CO, instead of the C, O, B, C, O, R, U, S. Then along diagonal line correlation coefficient is one because it is the correlation between itself. So obviously correlation coefficient, men and correlation between two pair of variables is calculated. And this matrix is symmetric as you see here. Now let me explain one heart vector. What is the one hot vector? Why hot vector is a real book presenting label? In the irish data, target variable contains 01 to 0 assigned to sentosa, 1 to versicular, virginica, 2 assigned to virginica. If you use this kind of labeling, what happens if someone misunderstands the Irish data said increasing number has a meeting. So maybe virginica is better than versicular. Versicular is better than sentosa. But target value has no that kind of information at all, simply the numbers assigned the target terrible is used for classification. But if we assign a number that way some people may get confused. In order to overcome that confusion, we use one hot vector and sometimes in AI analysis. The labor information is required to provide it while using one heart back to not the target style labelling. So when I begged her in order to give you an example of creating one hot vector. Here's a data fame is created auto from is a list object contains automobile brand Honda, Honda Kia Audi Benz. There are so the 15th cases of five brand names and here it is also list 15 years from 1990 2005 And it is step is one. So keep potentially increasing by one and rank. It is also a list from 0 to 14 and then they are combined to make a data frame. This data frame looking like this one year rank makeup, right? So in this case maker is based on string information. From this maker, we can create one hot vector. There are many ways of creating one hot beta. But based on my experience, I'm recommending to you to yours of function provided by pandas. Because this is Egypt real, creating one hot vector based on label information. If we apply auto farms information to set function, what happens? Set function contains unique values. There are five unique brand names, even though there are 15. There's duplication, so unique brand name. There are five unique auto brand names. Then the one hot vector, the dimension of want to factor is two dimensions for each observation. There's a list containing 0s and 1s. If the first observation is Audi then only the outer cases one. All other cases have 0 values. That is the one hot vector. As I wrote here, one hot vector means that among the elements of vector only one element has one and others have zero. It's like a kind of a one neuron represent one brand. If there are five water brand, there are five neurons matching to each brand. And if one observation belongs specific car brand, then the neuron becomes hot. So it is called one hot. That hot neuron has value one or other neurons are stay put so their value is zero. One hot vector represent such a case. So how can we create one at factor from maker credible information? You use PD.get on the bar .dummies so simple. That's why I told you before it is the simplest way. So after creating one hot it is one hot vector is created less execute this is what we see is this one. This is the one hot vector representation of the label ought to make the information. The first observation belongs to Handai, that's why it is hot or in other variables, it isn't cold, nothing, no reaction. The second one is Honda, that's why it has value one. All other values are zero. So that's why it is one hot vector. It is called one hot vector. Sometimes for label information, we need to convert the one column label information into this one hot vector structure. Now we can join this one hot vector to the author on the body F. Then what happens? This one we added. So by adding, by joining, by combining, we can check whether the classic this one affect the creation is correct or not. Hundai 1, Honda 1, Kia 1. So if it is here only random variable, lame feature name is here, it has one value. All other variables has .. So it is called one hot vector and using get on the bar dummy. Using get on the bar dummy function. You can easily create one hot vector. Now the question through all 4s. If we convert target or label available of Irish data into one hot vector, the symbol one hot vector is 100 comma three. Yeah correct because there were three kinds over this. That's why column has three dimension and roll number. There are 150 observation. Each observation belongs to one class in order to chair this outcome. Let me give you final example. There are two ways of creating one aspect from target. Target in case of target in teacher information is used for labeling in case of labor. We created string labels, right? So we execute the first one then we see what? This way. From 0 to 150. So one hot vector created and the other one. We are creating one one hot vector based on string label sentosa, varginica, versicular, varginica. 50 sentosa 54 versicular, 50 virginica. And we can compare read or to want to factor is exactly the same. They must be exactly the same. How we can compare, this way we convert NPRA and we check whether each element is equal to the element of the other data set. So all true. They must be true, right? They must be true. So using get on the bar Thomas function. You can usually create one hot vector. I also explained the meaning, the concept of one hot vector.