In the previous video, we introduced you to some key terms and concepts of statistics. In this video, we'll learn about one of the two main categories of statistics; descriptive statistics. Like I mentioned, there's two main branches of statistics; descriptive and inferential. Each of these provides a different type of insights from data, and both of them used together can give us a more complete picture. Descriptive statistics describe important characteristics of data, which we can summarize and display in charts, tables and graphs. This includes measures of central tendency such as mean, median and mode, and measures of dispersion such as standard deviation, variance and range. Don't worry if you're not familiar with these terms yet, we'll define and describe them in the next slides. Measures of central tendency are also referred to as Summary Statistics. Each measure is a single value that attempts to describe a sample by identifying the central position within that sample. The mean, median, and mode are all measures of central tendency. The mean, commonly known as the average, is the most popular and well-known measure of central tendency. It's equal to the sum of all the values in the dataset, divided by the number of values in the dataset. So if we have n values in a dataset and they have the values X1, X2, and so on, the sample mean, usually denoted by x bar, is written as shown in the formula on the screen. One thing to consider when using the mean as a measure of central tendency is that it's very sensitive to outliers, which are unusually small or large values. For example, if most of the fish from our bass lake sample are less than eight centimeters long, but there are a few fish that are twice as big, our mean is going to be a skewed by those large values and it won't accurately represent the actual sample. One way to deal with this is to instead, use the median. The median is the middle value in our dataset if we arrange all of the values in order. As shown on the screen, the median value for fish size is 6 centimeters. The mode is simply the most frequent value in the dataset, and it's often used to describe categorical data. In this case, the mode represents the most common or popular category. For example, if there are three colors of fish and bass lake, and we observe that most of the fish are green, then green would be the mode for the color variable. In addition to measures of central tendency, we also have measures of dispersion. These measures represent how spread out a dataset is. The variance is the average of the squared difference of each data point from the mean. That might not sound like it makes a lot of sense, and this number in itself isn't actually very useful, which is why we mostly use it to compute the standard deviation. Standard deviation is the most common measure of dispersion, and it tells us how much our data is spread out around the mean. When the standard deviation is larger, it means that our data is more spread out. When it's smaller, it means that more of our data points are closer to the mean. The range of a dataset is the difference between the largest and the smallest observation in the data. While this is easy to calculate, it's very sensitive to outliers and it doesn't really tell us a lot about our dataset. It only says what the largest and smallest values are. The inter-quartile range, on the other hand, describes the difference between the 25th and 75th percentile of the data. In other words, the middle 50 percent, and that can be more useful in understanding the distribution of data. We use descriptive statistics when we want to understand and define only our sample data. We can summarize and describe the sample using known quantitative measurements. This process of exploring a dataset using descriptive statistics is called exploratory data analysis, otherwise known as EDA, and is a big part of the data science workflow. Some examples of where we might see descriptive statistics used are in scientific or medical research to describe the study or the sample population. For example, we might want to look at the mean and median number of days that gray whales take to complete their annual migration. Or maybe we want to look at the standard deviation of patients' blood pressure in a clinical trial. We also often see descriptive statistics used in business applications, like observing certain characteristics of customer behavior. Descriptive statistics don't allow us to make conclusions beyond the data we've analyzed or reach conclusions regarding any hypothesis we might have made. They are simply a way to describe our sample dataset.