Next, let us consider a certain survey made by a movie streaming service. So 400 people were surveyed with this question, in the last month did you watch any movie from the genre: thriller, horror or drama? Well, we clearly see at least a cardinal categorical data here and also numerical data because the number of people who watched will be a numerical data and the type of movie they watched will be a categorical data clearly. Secondly, it's not a unique categorical data because the same person could have watched a thriller as well as a horror movie. Let us say the following table gives us the actual numbers that constitute the survey's results. The survey has also been done on four age groups of people, people who are up to 18 years old, 18-29, 30-55, 56 and above. That's again another categorical data; however, this is ordinal because we clearly have an order in the ages of the people. Secondly, let us assume that same number of people from each of these age groups were asked this question which is generally possible in a survey design. We can observe if you add the numbers of people who said yes to thrillers, horror and drama for just up to 18 age group, 55 plus 58 plus 20 is clearly a number that is larger than 100. There are more than 100. And this also indicates that the same person could have said yes I watched thriller and yes I watched horror to two different categories. Can we get inferences from this? Yes, to begin with let us plot the percentage of people who said that they watched movies from different categories. This is a very reasonable number among everybody that has been asked, we can clearly see over 50% of people have watched thrillers. A little over 40% people have watched the drama. Whereas, somewhere between 30 and 40% of people have watched the horror movie. This clearly gives more information than just the table that we had. And this is how you can categorize information based on these three categories thriller, horror and drama. Let us go and look at a little more categorization which is based on the ages. The bar chart here shows different age groups within each of the genre of movies. You can clearly see in the context of horror and drama that higher age group people seem to be less and less interested in horror movies, whereas higher age group people seem to be more and more interested in drama. In the previous graph, we clearly see horror had a lesser viewership compared to drama. But if you look at age groups also, you can clearly see there are more younger people who are watching horror, than the same age group watching drama. Clearly over about 60% of up to 18 age group seemed to have watched a horror movie, but only around 20% of people in the same age group have watched drama. This breakdown gives a much clearer picture of what is happening. And similarly thriller there seems to be peak for the age group around 18-29, whereas on either side of the peak lesser people are interested relatively. We had discussed about the difference between cardinal and ordinal data categories. So let us look at what influence they have in the context of data visualization. Here we have three cardinal categories, thrillers, horror and drama. Let's flip the order. Let's make it drama, thriller and horror. We have a new graph but does this new graph have any additional information that was absent earlier? The answer is no. Or for that matter did the earlier graph have any additional information that is lost now? I think the answer is no. Even in plotting and representation, the order between the categories did not matter. But let's look at the ordinal categories which is the age group that we are using. Remember in the current graph we observe a clear increasing trend in the context of drama. Higher the age, more people responded that they have watched the drama and higher the age fewer people responded that they have watched a horror movie. Let us flip the age orders. Now the first bar corresponds to the age group 30 to 55. Second bar corresponds to the age group up to 18, third corresponds to 56 and above, the last one corresponds to 18 to 29. How meaningful is this graph? Here significant information seems to be lost. We do not get a visual clue that there is an increasing trend or a decreasing trend or any trend at all anywhere. This is okay, in many cases due to spurious ordering of certain categories in orders that they should not be done, you observe trends where trends do not exist. Both are bad. Sometimes observing a trend when it doesn't exist could be worse. So, one needs to be careful in identifying whether a data type is cardinal or ordinal and to make decision on the ordering. Finally, it's not just this where we have grouped genres by different age groups in one place. For example in the current chart we are looking at all the data about drama genre across different age groups in the first block, all data about the thriller genre of different age groups in the second part. Whereas let's flip the order. Let's group all the data about up to 18 years old category in one place but for all the three different genres from 18 to 29 in place. Again this does not seem to be as useful representation of data. Again, it has a lot to do with whether the data category was ordinal or cardinal. These are some of the thumb rules one might have to think about and carefully plan when visualizing data. And finally, let me talk about data summarization. We could take the average of all the numbers that we looked at the table and say 42. We could take the median, we could take the standard deviation. But are these numbers meaningful? The answer is a sad no. Because we already discussed that the same person could have watched a thriller as well as a horror movie. So it does not make much sense to take the average of everything when there are repetitions. Summary statistics might be easy to calculate, but one has to be careful on whether these numbers make sense.