[MUSIC] Assume that we have several observations in our data set and we are interested in a particular variable, for example, age. What we are interested in is some value and that is in a sense a typical value of this variable across our data set. There are several possible answers to these kind of question and these answers are called measures of central tendency. Assume that we have some numeric values in our variable. Then the first measure of central tendency, probably the most known is just average or mean. Let me assume that I have values which will be denoted by x1, x2, and so on, xn. And I will denote these values by letter x. Then average of x or mean of x is just an arithmetic average of these numbers. So it is usually denoted by x bar, and it is just a sum of these values, Divided by the number of these values. This is just an arithmetic average. This is a very known measure of central tendency. Sometimes we say that the average age of our respondents is the following for example, 35 years. And some reader of our report understands that we consider adults and not children. So this average provides us some useful information about the data set. But it is not the only possible way to consider this measure of central tendency. Another way is to consider median. What is median? It is easier to explain using some example. Let us assume that my x is a series of numbers like this one. Let me rearrange the sequence in ascending order. So I just sort these values. In this case I have 2, then 2, then 3, then 5, then 17. Then I will pick the element that in the middle of this sort of sequence. This element is called median. If there are even number of elements in our original data set, then there is not one element in the middle, but two elements. In this case, median is just an average of these two elements in the middle, but it is rather technical landmark. The idea of the median is that it is an element such that the half of elements is larger than median and another half is lower than median. Sometimes it is important to use median and not average. For example, if we discuss incomes, for example, we discussed incomes in some country. And we understand that in this case, we have not so many people who has very large incomes and most of people doesn't have such large incomes. Those people with very large incomes can affect average rather significantly. But for median if I replace for example this number 70 with 70,000 it does not change median in the set at all. So in a sense, median is a more robust measure of central tendency. It is not sensitive to values that are too large or are too small if we don't have many of these elements. And this is a good property of median when we discuss such thing as for example median wage. Also another advantage of this median is that it can be applied not only to numeric data but also to other categorical data. Because if we have odd number of elements, the only operation that we need find this median is just to sort these elements. And to sort them, we just have to compare which one is larger. So even if you don't have numeric values, but you have just categorical but ordered values, you can find the median of the corresponding data set. Next, let us discuss a third popular measure of central tendency which is called mode Mode is just the most frequent value that we can meet in our data set. The advantage of mode is that it can be applied to an ordered categorical variables for which even median is not defined, because we cannot compare the corresponding values. For example, if we have some data set that corresponds to variable like home town and then our x is something like Moscow, New York, London. Again, New York, Again, Moscow, And again, New York. In this case, we can find mode of this x and it is just New York because, New York is the most frequent value in this set. So mode, Of x is New York. The most frequent value. Of course, it is possible that there are several most frequent values in the data set. For example, if I add Moscow into this data set, then we will have three elements with value New York and three elements with value Moscow and both of them will pretend to be called mode of x. In this case, we have to select from these elements in arbitrary way. For example, we can solve the corresponding labels alphabetically and select to the first label under this sorting, in this case, it will be Moscow. Mode is defined for any values, or for any types of variables. But it is mostly useful for categorical and ordered variables so it's not so much different values. Measures of central tendency describe, in a sense typical element of our data set. But it is also important to understand how far elements of our data set deviate from this typical value. To do so we have to consider some other descriptive statistics, for example, variance. Let us proceed to them. [MUSIC]