[SOUND] We collect data so that we can make better decisions. For instance, you may collect data on how our marketing promotion is impacting sales, or collect data on the productivity or workforce and etc., but looking at piles of data is not very useful on it's own. You need to understand the data and one of the best thing in gaining a better understanding and what the data is telling us is by organizing and summarizing the data through visual means. In this module, we will focus on visual summarization techniques for both quantitative variables and categorical variables. Specifically frequency tables, histograms, pie charts, and scatter plots. We will start with frequency tables. You see many tables such as this in media or business meetings. The tables here at a glance show us the top 13 best selling pick up trucks in America, up to August of 2015. This type of table is known as frequency table. And is very useful in summarizing categorical data. In this case, truck model name. Looking at this table, you can see that we have data on 233,601 trucks that were sold in the first 8 months of 2015 in the US. If you looked at the actual 233,601 records, you will be hard pressed to know right away which one is the number 1 seller and which one is the number 2 best seller. Some raise in the data in a table like this, helps us exactly know, what has sold, and how the model stack up against each other. We can also show this information with a bar graph. The height of each bar represents the number of each specific truck sold. And this is another way of taking the raw data and showing a quick overview of the data collected which helps us to be more effective in communicating about this data. Instead of counts of each model sold, here, we can also summarize each category, by their relative frequency. For example, we know that Ford F Series is the bestselling truck, but what is the market share for this truck? We can answer this type of a question by finding the relative frequencies. We get the relative frequency by taking number of times a value was observed divide it by the total number of observation. So for example, for the Ford F-Series that is 71,332 divided by 233,601, or 0.3054. So now, not only we know that Ford F-Series is the best selling truck, but we can say that about 30.5% of the market belongs to Ford. Just by summarizing, we begin to gain insight about our data. We can create a relative frequency for all models and now we can see that the top 3 best sellers collectively have about 73% of the market. This is a bar graph with the relative frequencies displayed on the y axis. Now, imagine a call center. An important part of a customer experience is getting to an agent quickly. If we want to understand what customers are experiencing, we can gather some information on waiting times. Every customer could have a slightly different experience from another. Look at these ten observations. This table shows waiting times in seconds for ten customers. But let's say we have 500 such observations. Looking at 500 entries will not give us a quick understanding of what is happening. So we can use a frequency table to summarize the data to gain a better understanding. To create a frequency table for quantitative data similar to what we have here. We need to create ranges and then count how many customers have that type of experience. For example, we can place customers in a range 0 to 0.5 minutes. That is 30 seconds or less. Then, the customers waiting between 31 seconds and 60 seconds in the second then and so on and so forth. Here, we see the summary of 500 observations and the values in the frequency columns represents number of customers experiencing various waiting times. For instance, 66 costumers waited between 181 seconds to 210 seconds before they could talk to an agent. The relative frequency column shows what percent of customers experience various waiting times. In this case, 13% of the customers waited between 181 seconds to 210 seconds, that's between 3 minutes to 3.5 minutes. Many reports will use various ways of generating summarizing data and you have to look closely to see how the numbers have been summarized. Here's the table from a report by the US Department of Transportation. Let's see how to use this to answer couple of questions like, which distracted age group had the highest levels of fatal crashes because they were distracted? 15 to 19 year olds happen to be that age group. Out of 3,212 drivers in that age group involved in a fatal crash, 344 crashed because they were distracted. What about which age group had the highest levels of fatal crashes because they were using a cellphone? Again, the answer is 15 to 19 year olds. Here, 72 out of 344 distracted drivers were distracted because they were on a cellphone. Thus, 21% of all fatal crashes due to distracted driving in this age group occurred because they were using a cellphone. Now, I will let you try. Find out what percentage of distracted 20 to 29-year-old drivers were using a cell phone when they had a fatal crash. The answer is 15%, which is 117 out of 790. So again, by using the frequency tables we can take large amounts of data and create a summary that is easy to understand and very useful in communicating. But also it will serve us in asking questions which would eventually lead us in improving our understanding of what is going on. Like how to decrease the number of fatal crashes. In this lecture, you learned two graphical methods in summarizing data. Once again, summarizing of the data enables us to get a quick understanding of what is going on and allows us to begin exploring data with more sophisticated statistical methods.