You have gathered data and wish to answer some question. Any type of answer will always start by first looking at descriptive statistics. This is a common practice before you dive into any calculation or analysis. In this video, I will show you how to describe and summarize data in just a few simple statistics. Let's take a look at an example. Imagine you are the producer of coffee, and you want to sell your coffee as decaf. Therefore, you measure the caffeine percentage of batches of coffee beans. If the caffeine percentage is below 0.1%, the batch can be sold as decaf. The data you gathered look like this. For each batch, you recorded the batch number and the caffeine in percentage measured for that batch. You have measured 40 batches in total. Now you wonder, how well does my production process do? Am I producing decaf coffee? You may be asked to tell what you know about this data set. It does not really makes sense to read aloud all 40 measurements. Instead, we will like to summarize these values into just a few key figures. And these key figures are called descriptive statistics. Do you recall that there are two types of data? Numerical data are numbers, while categorical data are classes and categories. Descriptive statistics are only available for numerical data because they are based on mathematical operations like addition and multiplication. For categorical data, you can make a table. Let's take a look at how to calculate the descriptive statistics with Minitab. Pause the video, load your data before you continue. This is what your data should look like in Minitab, with batch number in one column, and Caffeine% in the other. You can ask Minitab for descriptive statistics under the Stat menu, under Basic Statistics. And then the first option is Display Descriptive Statistics. Which variables do you want to see? Well, that's of course the Caffeine%. Okay, and now in your session window, which is here, you will find the descriptive statistics for Caffeine%. Here you see some descriptive statistics of the variable Caffeine%. Minitab reports N and N*. N indicates the number of measurements in your data, and N* indicates the number of missing values in your data. 40 values are available, and 0 are missing in this example. You have probably heard of the mean, which is also called the average. It is the sum of all measurements divided by N. And it is often used to indicate the location of your data. The average caffeine percentage is 0.08%. So on average, the coffee is decaf coffee. However, there is always a certain degree of variation or dispersion in your data. A low amount of dispersion means that most failures are rather close to the mean. A high amount of dispersion means that most failures are quite far away from your mean and spread out over a larger interval. To measure dispersion we look at the standard deviation. Let's have a look at what a standard deviation means. Consider these two datasets. The points are the data values or measurements. And the straight lines are your means. You can see that the dispersion or the variation on the left is lower than the dispersion on the right. But how would we quantify this in a number? We look for each point as the distance to the mean. That is, the red lines. We usually average these distances to quantify dispersion. There's one problem, however. As these deviations are both positive and negative, their average will be approximately zero. In both cases. So, to solve this problem, we square the distance first because squaring will give us positive numbers. The average squared distance is called the variance, taking the square root again gives us the standard deviation. A standard deviation of 0.016, as in this example, can be interpreted as, that on average, observations are 0.016% away from the mean. Let's take a look at the other part of the descriptive statistics. We have the minimum and the maximum, and they are quite easy to understand. They are the lowest and the highest values in your dataset. If you order all of your data points from small to large, the media is the value that separates the smallest 50% from the largest 50% of your data. Just like the mean, the median is a measure of location. Compared to the mean, it has the advantage that it is not affected by outliers. But, the disadvantage is that it is not using all your data points. It just uses the middle value after all. And it is therefore less precise than the mean. Q1 and Q3 mean the first and the third quartile. Q1 is that value that separates the lowest 25% of the data from the highest 75% of the data. In this example, the value that's in between the 10th and the 11th value. Q3, the third quartile separates the lowest 75% of the data from the highest 25% of the data. And in your example, as n is 40, this will be the value. The average of value number 30 and 31. In some cases, the range is used to indicate dispersion. It is the difference between the highest and the lowest value, or your maximum and your minimum. You might understand that only one single outlier in your dataset can dramatically influence the range. Therefore, the distance between Q1 and Q3 is more often used to quantify dispersion, and is called the interquartile range. The interquartile range is less susceptible to the influence of outliers. Let's go back to our example of Caffeine%. Imagine we want it to be 0.1% for decaf coffee. We see that on average, the caffeine content is below this. However, the range or the dispersion in the data is relatively high. And we see that the maximum percentage of caffeine is higher than the 0.1%. Summarizing, you learned how to let Minitab compute descriptive statistics. The mean and the median say something about the location of your data. And the standard deviation and the variance say something about the dispersion in your data. The range and the interquartile range do that too.