Welcome to Turn Categorical Values to a Numeric Variable in R. In this video, you’ll learn how to make statistical modeling easier by converting categorical values into quantitative values. Most statistical models cannot take in objects or strings as input and, for model training, only take the numbers as inputs. In the Airline dataset, the "Reporting_Airline" feature is a categorical variable that has nine values, "AA", "AS", "B6", "DL", "HP", "PA (1)", "TW", "UA" or "VX”, which are in the character type. For further analysis, you’ll need to convert these variables into some form of numeric format. To solve this problem, encode the values by adding new features corresponding to each unique element in the original feature you would like to encode. The feature “Reporting_Airline” has nine values so, you create nine new features "AA", "AS", "B6", and so on. When a value occurs in the original feature, you set the corresponding value to one in the new feature; the rest of the features are set to zero. In the reporting airline example, in the first row, the reporting airline is "UA". Therefore, you would set the feature "UA" to one and the other features to zero. Similarly, for the second row, the reporting airline value is "AS". Therefore, we set the feature "AS" to one and the other features to zero. In tidyverse, you can use the spread() function to convert categorical variables to dummy variables. To do this, you need to define three arguments in the spread() function: key, which is the column to convert into categorical values, “Reporting_Airline”; value, which is the value you want to set the key to (in this case “dummy”); and fill, which fills the missing values as zero; otherwise, they will be left as NAs. Alternatively, instead of assigning the dummy values 0 or 1, you can assign flight delay, “ArrDelay”, values to each feature. In the reporting airline example, for the first row, the reporting airline is "UA“, and the arrival delay is 2 minutes. Therefore, you can set the feature "UA" to 2 and leave the other features as NA (or blank). Similarly, for the second row, the reporting airline value is "AS", and arrival delay is -21 minutes. Therefore, you can set the feature "AS" to -21 and leave the other features as NA (or blank). To accomplish this, you will also use the spread() function. Continuing with the previous example, the key argument is set to the "Reporting_Airline" column and, for the value, you can choose to either convert the data to 0 or 1, or you can assign the values in the “ArrDelay” column. This example does not declare a parameter fill, so the function will leave the missing values as NA. Also, remember that the spread() function drops the "Reporting_Airline" and "ArrDelay" columns by default in output if you use them as arguments in the function. In this video, you learned a technique for turning categorical values to a numeric value. This technique is often called “one-hot encoding” and involves creating dummy variables by using the spread() function and then converting the categorical variables to any dummy variables that make sense to you.