0:02

In this section of our lecture,

Â we present some important descriptive numbers

Â that measure the variability of the observations around the mean.

Â Usually to indicate this variability,

Â we speak about the spread of the observation around the mean.

Â In particular, in this lecture,

Â we'll see the range,

Â the variance and the standard deviation.

Â We need to keep in mind that in all context there exists variation.

Â There exists always a mean value and a value which stays around the mean,

Â and which is indicated by the spread.

Â For example, in finance the variation from the mean represents the risk.

Â A students study on average eight hour per day.

Â This is a mean value.

Â The student may study some days 10 hours,

Â some other day six hours and so forth.

Â It is interesting to notice that while two datasets could have the same means,

Â one set of observation can have a higher degree of dispersion around its mean,

Â compare with the other set.

Â This means that given two sets of observations,

Â the individual observation in one set

Â may divert more than the mean than the other observations.

Â For example, in sample A,

Â we have 1, 2,

Â 1 and 40, and in sample B we have 9,

Â 10, 11 and 14.

Â If we calculate the mean,

Â this is for both sets equal to 11.

Â However, we can notice that the data in the first sample

Â are further from the mean value than the data in the second sample.

Â This is why once we get the mean,

Â it is important for us to know also the measure of this spread around the mean.

Â 2:49

The range is the difference between the largest observation and the smallest observation.

Â The greater the spread of the data from the center of the distribution,

Â the larger the range.

Â Since the range takes into account only the largest and the smallest observations,

Â it might give a distorted picture of the data.

Â In particular, this is likely to happen if there is an unusual extreme observation.

Â This is why although the range measures the total spread of the data,

Â it might be not a satisfactory measure of the variability.

Â 3:38

The usual extreme observations we can have in our data are called outliers.

Â Basically, the outliers are either very height or the low observations.

Â The influence of the outliers in the data may distort our final understanding of them.

Â One way usually used to avoid

Â this drawback is to set the data in ascending or descending order.

Â After, we need to discard some of the highest and some of the lowest numbers.

Â Finally, we find the range of those remaining.

Â 4:49

Although the range measures the spread of the data,

Â we need a measure that would average

Â the total distance between each of the data values in the mean.

Â In these cases, we have to do with the variance and the standard deviation.

Â Notice that for all datasets,

Â if we sum up all the distances between each of the data values in the mean,

Â it will be always equal to zero.

Â This is understandable if we consider that the mean is the center of the data.

Â 5:33

If the data value is below the mean value,

Â the difference between the data value and the mean would be negative and vice versa.

Â This is why we square these differences.

Â Then each observation, both above and below the mean,

Â would be part of the sum of the squared terms.

Â 6:02

The variance represents the average of the sum of the squared terms.

Â The population variance is indicated by the Greek letter Sigma square.

Â The variance is the sum of the squared difference between

Â each observations and the population mean divided by

Â the population size N.

Â 6:32

The sample variance is indicated by the capital letters S square.

Â The sample variance is the sum of

Â the squared differences between each observation and the sample mean,

Â divided by the sample size, and minus one.

Â 6:54

Notice that for sample data,

Â the variance is found by dividing the numerator by n minus one and not by N. Why?

Â Because mathematical statisticians have shown that if the population variance is unknown,

Â a sample variance is better estimator for the population variance,

Â if the denominator is given by n minus one.

Â 7:42

Now, we need to define the other important numerical value.

Â That is the standard deviation.

Â The standard deviation is the square root of the variance.

Â This is why it restores the data to their original measurement unit.

Â The standard deviation measures the average spread around the mean.

Â The population standard deviation is the positive square root of the population variance.

Â Then it is the square root of

Â the sum of the squared differences between each observation and the population mean,

Â all divided by the population size N.

Â The sample standard deviation is the positive square root of the sample variance.

Â Then it is the square root of the sum of

Â the squared differences between each observation and its mean value,

Â all divided by the sample size N.

Â 8:59

For example, calculate the standard deviation of the following data;

Â 6, 8, 10, 12, 14, 9, 11, 7, 13, 11.

Â We need to follow three steps.

Â Step one, we need to calculate the sample mean,

Â then we sum up, 6 plus 8,

Â plus 10, plus 12,

Â plus 14, plus nine, plus 11,

Â plus seven, plus 13, plus 11,

Â and we divide them over the number of the observations, which is 10.

Â This will be equal to 10.1.

Â Step two, we find the difference between each of the data in the mean, which is 10.1.

Â Then we have 6 minus 10.1,

Â plus 8 minus 10.1,

Â plus 10 minus 10.1,

Â plus 12 minus 10.1,

Â plus 14 minus 10.1,

Â plus 9 minus 10.1,

Â plus 11 minus 10.1,

Â plus 7 minus 10.1,

Â plus 13 minus 10.1,

Â plus 11 minus 10.1.

Â These will be equal to zero.

Â Step three, we need to square each difference and then we

Â have 6 minus 10.1 to the power of 2,

Â plus 8 minus 10.1 to the power of 2,

Â plus 10 minus 10.1 to the power of 2,

Â plus 12 minus 10.1 to the power of 2,

Â plus 14 minus 10.1 to the power of 2,

Â plus 9 minus 10.1 to the power of 2,

Â plus 11 minus 10.1 to the power of two,

Â plus 7 minus 10.1 to the power 2,

Â plus 14 minus 10.1 to the power of 2,

Â plus 11 minus 10.1 to the power of 2,

Â which is equal to 69.9.

Â 12:00

Then we can calculate the variance.

Â We do it by dividing the squared differences by the number of the observations minus one,

Â which is 10 minus 1 equals to 9,

Â and then we have 69.9 over 9 equals to 6.76.

Â Finally, we can calculate

Â the standard deviation by taking the square root of the variance.

Â The square root of 6.76 is almost equal to 2.6.

Â