[MUSIC] Let us discuss in more details how to work with particular columns and how to convert columns from one type to another. We continue working with dataset, adult.csv. Let us consider for example, age, variable age column. You see that this is a column of integer numbers and its dtype is int64. But it is also possible to have some float numbers in a variable like age. And we can convert this column to a different type using method astype. So for example, we can use the command astype ('float') to convert all these numbers to float type. Now you see that they are not integer but floats due to this dot here. And the corresponding dtype also changed. Actually in some cases, this conversion from integer to float is done automatically. For example, if you perform some operation with numbers of this column, which makes some numbers not integer. But this astype method is a general method that allows one to convert variables from one type to another type. Again, just this command just returns a new column, but it does not change the initial data set. If I wanted to change the value of this column in the initial data set, I have to make an assignment like this. Now if I check the dtypes I see that age is float not int. In the same way, I can use a different kind of conversions between categorical variables and numeric variables. For example, we have categorical variable education. And we maybe interested in converting this education from strings to some numeric values. As we discussed, this can be done in different ways. For example, we can use a label encoding, which just assigns to each category level here particular number without any particular meaning. And these can be done in Pandas in the following way. First, we get our column education and you see that currently it is an object. As I said previously, in Pandas, there is a dedicated type of variables which is used to store categorical variables. And we can convert this object variable to a categorical using astype('category'). Actually, the content of this column did not change much but if you look at the corresponding information about this column. You see that now it is not just an object, it is not just a collection of strings, but it is a categorical variables. And there is 16 levels, 16 possible categories, and here are these categories. Now, we can use this new column, categorical column. Let us write it at the same place, as before, education. And now we can convert this categorical variable into a numeric variable in the following way. We get this education column, then we put cat. This is a special thing which is called accessor, it allows us to get access to specific methods that are associated with categorical variable. Then after this cat, I can use codes. You see here the new column that contains numbers, just integer numbers, and each number is a code of a particular category. Let me add this new column to my data set, education-codes. I just used a new name for this column, education-codes. And let us look now at the education and the education-codes. You see that when education is bachelors, the code is 9, and when education is HS-grad, the code is 11. So different categories have different codes. And now we can get information on which category corresponds to which code using the following theme. I get to this education column and I meet your cat accessor. And then I just put categories property into it. So here all categories are just in the least like object, and the position of each category corresponds to the corresponding category code. So this category 10th has code 0, this category 11th has category code 1, and so on. So here they are just in the order of their codes. And this allow us to use the simplest possible way to encode categorical variables. But if you are interested in something more complicated you can use some special Pandas function to do it also. For example, we can use dummy encoding or one hot encoding with pd.get_dummies. This function allows us to construct dummies from some categorical variable. Again, we can use education, And you see that this function returns a new data frame. And in this data frame each each row consist of zeros and ones. And all the columns corresponds to categories, category levels so that we have in this variable and they're on there. Number 1 in the row corresponds to the level that our category take for a particular object. For example, for the first row, we had a category of education is bachelors, and so we have 1 here. The same for the second row, and for the third row it is HS-grad. So if I want to add this new variables to my initial data frame, I can use, for example, join command. And now I have these new columns that gives us one hard encoding for education variable. If I want to get really dummy encoding when the number of new columns is equal to the number of category levels minus one, I have to provide a special option to this get_dummies. Drop_first = True. So here you see we column for 10th category, because we understand that if we have all zeroes here, then it means the 10th has to be 1, and vice versa. If you work, for example, with linear regressions, you probably want these kinds of encoding for your categorical variables. So these are just the simplest way to work with columns and to encode them in different ways. By the way, if you have a column for example, numeric column and you want to get some information about its particular descriptive statistics you can also do it in the following way. You can get this column and get for example, mean value of this column and then you get just a number that is the mean value of numbers in this column. In the same way, you can get standard deviation for example, Or median, Or something else. There are no special function that allows to find interquartile range. But if you can find the corresponding value using quantile function, quantile gives a particular quantile of your data set. So for example, this is the third quartile and this is, The first quartile and interquartile range is just the difference between them. So we can just draw it in the following way. This allows us to work with particular columns in our end data frame. Actually, this is just the beginning of the story of Pandas. We will discuss later more complex ways to use Pandas to analyze data frames. [MUSIC]