[MUSIC] The first step in analytics or statistics is to have a good look at your data and before you begin, try to understand what kind of variable you're working with. And based on the type of variable, you will decide what kind of analytics could be performed with it. Let's have a look at various different types of data that we encounter and is commonly used in our daily lives. The most common one would be a cross-sectional data, which is basically looking at a measurement taken at one point in time. Census in a given year is a cross-section of the society, as students evaluate course and instructor, that's a cross-section at any given point. Compared to the cross-sectional data, we can have panel or cross-sectional panel data, which is essentially asking the same group of individuals the same questions repeatedly over time. So you may pick a group of people, constitute it as a panel, and then ask the same questions once every year over a given period of time. The time series data is rather different. You're looking at a particular phenomenon such as unemployment rate, and then you measure it every month and then display that data or analyze that data which is repeated measurements on the same phenomena over time. So you may have monthly data going back to 1940s or climate data going back to hundreds of years. So based on the type of data cross-sectional panel time series, we will pick appropriate tools, statistical tools to deal with them. If your data set has only one variable, it's called a univariate data set. And if you have multiple variables in your data set, then it's a multivariate data set. Let us now look at variable types and start with categorical or nominal variables. Let's consider home ownership, for instance. One can either own a home or rent a home, and knowing that there are only two categories here, owning and renting, that is a categorical variable. The tenure status of an individual is essentially a categorical variable. In this particular case, because you only have two choices, own or rent, it's a binomial variable. Consider travel choices. You can go to work by driving or by someone drives you there, so you are a passenger, you can take public transit, or you can walk or bike. So in this particular case, you have four choices, more than two, so we call it multinomial. So both binomial and multinomial variables are part of categorical variables. You cannot have any quantitative relationships among categories, and for these types of variables, averages are usually meaningless. So if you have a mode of travel and you have four categories, an average category would mean absolutely nothing of use. A particular type of categorical variable is ordinal data, where data are ranked or ordered in some particular fashion. So for instance, number of cars owned by household. A household may have zero car, one car, two cars, three or more cars, and that essentially is an ordinal data where zero represents zero, and zero cannot be coded as one and one cannot be coded as zero. So the order in which variable has been recorded matters. Categories can be compared with one another, and you still cannot use regular statistics. The differences are also meaningless in this particular case. Another type of data is called ratio data, which is the data set that have a natural zero. For example, sales dollars, length of distance, or weight of an object. These are all examples of ratio data, and I often would use the term continuous data or continuous variable. So a variable such as distance from point A to B could be 8 kilometers, 8.5 kilometers, 6.2 miles. The variable is continuous, and zero makes some logical sense in this particular variable. So for instance, you say I have $0, 0 means something here. It's a strongest form of measurement, and you can compute ratios and differences. And another type of variable is interval data or interval variables that are ordered and characterized by specific measure of distance between observations, and it may not have a natural zero. So temperature is a good example. And when you say that it's 0 degrees Celsius, it does not mean that there is no temperature. It's freezing, but it is measuring something that exists. So ratios are also meaningless. So for example, if someone said, well, the temperature in some African countries is 50 degrees, compared to somewhere tropical where it was 25 degrees, it doesn't mean that the temperatures in the African desert is two times or twice as hot as it is in the tropics. But we can say that there is a difference of 25 degrees between the two places. [MUSIC]