Hello, and welcome back. My name is Mark Rulkowski and I'm a lecturer in the Department of Statistics here at the University of Michigan. Today, we're going to talk about categorical data and the different ways we can summarize and view it. Remember that categorical data simply classifies individuals or items into different groups. Here the example that we're going to use to group our different variables is marital status. It's coded as such as seen on the bottom of the slide, where a value of one represents a married individual, two represents a widowed individual, three, divorced; four, separated; five, never married; six, living with partner; seven, refused and eight, don't know. The table here shows just an example of six different individuals and their marital status. When looking to summarize this data, the most common way to do that is with a frequency table either with counts or percentages. Here we create a table with all the different categories that we have. Here we find the eight different marital status categories: married, widowed, divorced, etc, the corresponding counts, and the corresponding percentages of those categories. So, here we can see that out of the 5,560 individuals that we have in the data set, 2,683 of them are married, 467 of them are widowed, so on and so forth. In order to find the percentages for each of these groups, we would simply take the count for each group and divide by the total count. So, to get the percentage of married individuals in this data set, we would take 2,683 and divide by our total count of 5,560 to give us a final percentage of 48.3 percent of people married from our data set that we have. This is the most common way to summarize categorical data. The most common way to visualize categorical data is with a bar chart. The bar chart here, we see on the x-axis has marital status and the eight different marital categories that we had from the previous slide, and our y-axis is simply the frequency or number of counts for each of the group. So again, we find that first bar shows that married people make up 2,683 people from our data set. We can also take this bar chart and instead of counts, use those percentages that we also had in the frequency table. So, here we see that the shape of the bars and the overall graph doesn't change at all, but now we have a different y-axis, where we have percentages instead of counts. Sometimes displaying this information is more useful than the actual count value. With nominal data such as marital status, where the order of the variables themselves don't matter, sometimes we want to rearrange the bars in descending order. This could be more helpful if we wanted to know which groups made up the largest percentage of individuals in the group that we have. So, here we see that married, never married, and divorced are the top three categories. So, with nominal data, you can rearrange the bars in a more useful manner. Another option for graphs with categorical data is a pie chart. We don't recommend pie charts as much as bar charts because of a couple of reasons. First, as you can see with the labeling, sometimes there's overlap for very, very small slices. As we can see with refused and don't know, the labels run over each other and makes it hard to read. This can be hard to work around depending on the program that you're using. The other main issue with pie charts is without proper labeling, it's hard to see which slice of the pie is bigger than others. So, here in this example, widowed, divorced, and living with partner are all very similar and without the percentages shown there, you might not know which section of that represents a larger portion of the overall graph. Here we can see that divorced is because it says 10.3 percent next to it, but without that it might be hard to discern. So, we try to stay away from pie charts and go more towards bar charts. So, now we've seen with categorical data, the best method for summary is usually just a frequency table with either counts or percentages or even both. Again, bar charts are a great method of visualization, again, with either counts or percentages. If you choose to use a pie chart, please do so with caution.