A useful start to any data analysis exercise is to always ask yourself, what type of variables am I studying? Are they numerical or are they categorical? Today I want to teach you these concepts and what the difference is between them. Also, I will discuss with you why it is important to ask yourself that question of, what type of variables am I studying? Okay, remember this dataset? It was data on international transfers. Have a look at the variables. They do not look alike. Total time is different from type of reclaim. Therefore, we make the following distinction. There are two main categories of data. The first one is numerical, and the second one is categorical. If you focus on numerical, you can make a subdivision between discrete and continuous data. For categorical, we can make a subdivision between ordinal, nominal, and binary data. Let's first take a closer look at numerical data, also known as quantitative data. Numerical data is data based on numbers that you can calculate with, such as temperature, dimension, or the number of people in a room. You can make a distinction between discrete, which is count data, or data that is rounded, and with variables that are continuous, like temperature, that can be any kind of number. Categorical data is qualitative data. Hereby, you divide the data into groups. If there is a meaningful order to the groups, it is ordinal data. If the groups consist of unordered categories, it is nominal data. If you only have two options available, we will call it binary data. Okay, let's take a look at our example again. Can you identify the numerical and categorical variables in this data set? The total time is, of course, a numeric variable. Since time is not limited to integers only, it will be a continuous variable. The type of reclaim is categorical, and it is nominal because there's no clear ordering in the type of reclaims. The number of iterations is a numeric variable, and it is discrete, as it is count data. So, you might be wondering why do we make this distinction. The first reason is the information density. Numerical data contains more information than categorical data. Sounds abstract, right? So let me give you an example. You are a head nurse of a surgical department, and it is known that your patients with a hip fracture should leave the hospital within 6 days. You want to see if this is feasible and collect data from the electronic patient files. You have these first five patients. Now you have a choice. You can record if the patients have left within 6 days, and you fill in yes or no. What type of data is this? Well, it's binary, and categorical data, therefore. You could also have recorded how many days the patient was in the hospital. What type of data is that? Well, it's numerical data. With invariable duration, we can always make the data into yes or no column. However, you can never go back. And that is what I mean with numerical data contains more information than categorical data. Hence, always collect your data in a numeric way, at least if you have a choice, of course. What this also implies is that, for categorical data, you need more observations than for numerical data. As a rule of thumb, you can use that for a numerical CTQ, you typically need 30 observations, while for a categorical CTQ you need a lot more, at least 300 observations. The second reason to distinguish between numerical and categorical data is a very practical one, it determines which tool you should use. If your Y variable is numerical, you can make a histogram or a boxplot. However, if your variable is categorical, you can make a pie chart, bar chart, or a Pareto chart. We will see all these charts in the videos on visualizing a single variable. I started this video by asking you a question. What type of variables are you studying? We've made a distinction between numerical and categorical variables. And there are two reasons to do this. First, they contain different levels of information. And second, you will know which statistical tool you can apply.