So, what is data abstraction? Is the idea that you can describe data in ways that help you decide what encoding methods are available and appropriate for this information. So once again, using the same dataset that I've shown you before, we have again the table with vehicle collisions. So, I know for instance that the column, the field that collects information about boroughs, represents a number of different categories, every single borough is a category. And I know that bar charts are appropriate visual representations for what we call categorical information for categories. So, in these example, I knew that an appropriate way to represent this information is to use a bar chart. Let me show you another example, again with the same data. Let's say that we want to show how vehicle collisions distribute across the New York City map. Well, tgml again, an appropriate visual representation is a natural map that shows you with coloring density by borough, how many collisions happened there. So, why do I show you that? Well, to communicate the idea that knowing what type of information, what type of data you have, helps you make choices about what visual representations are available and appropriate. Now, I want to use the important concept of data abstraction. So what is the abstraction? Data abstraction is a method to describe data in ways that help you decide what operations and encoding methods are available and also appropriate. So, referring back to the previous diagram, since we have data transformation and data encoding, data abstraction is a method that helps you figure out what transformations are possible, and what visual representations are also possible with the data that you have. Let me give you a couple of examples. So here, we have once again a dataset that describes vehicle collisions in New York City. And one of the attributes that we have in the tables is about different boroughs. And these different boroughs represent a number of categories. And I know that an appropriate chart to represent for instance, the number of collisions in the different boroughs is a bar chart. Why a bar chart? Because a bar chart can accommodate categorical information, a set of categories with associated values. Let me give you another example. Same data in a year of in the dataset information about how these vehicle collisions are distributed geographically. So once again, if I'm able to identify this information, I know that one option available to me as a visualization designer is to use a map to show vehicle collisions distribute geographically. So, another ways to describe data abstraction is that it's a way to recognize common structures in data, that necessarily come from different domains. Every single dataset describes a particular phenomenon, and comes from a particular domain. But we need a way to abstract a way from the domain. So, that the common structures help us define what visual representations are available and appropriate. Let me give you a couple of examples. So, let's imagine that you have data that describes friendships in Facebook, right? Maybe these could even be data coming from your on Facebook account if you have one, or Twitter, or any other social network. So, if you look at the connections between you and all the friends that you have and among the friends that you have, you can build a data structure that is a network, right? Let's take another example. Say biological data describing interactions between molecules and proteins. There are some molecules and proteins that interact and some others that don't interact. You can describe this as a network. Or say that you are an investigator, and you have a dataset that describes connection between different people that maybe are part of some criminal organization. Okay. So, all of these cases describe data that come from very different domains, and probably with very different purposes and goals. But what they have in common is that they all can be described as networks. So, here we are doing an operation of data abstraction. We all know that all these different datasets coming from different domains, can be described with the same structure, a network. Let me give you another example. Say that you have a dataset that describes animal movements. Maybe you are a biologist who are studying animal behavior. There are lots of these kind of datasets out there, or another dataset could be the result of elections in the United States, or another one is say engineering data about coming from an experiment to investigate how air flows in an aircraft. So, these are very different domains. But what they all have in common, all these data sets is that they describe spatial phenomena. These are all phenomena that happen in space. So, here is another example of what I mean when I say data abstraction. Abstracting away from the domain to identify characteristics that are useful to decide what visual representations are available and appropriate. So typically, when people start studying how to create effective visual representations, they ask me, "How can I visualize this?" They come with some dataset, and the first question is, how can I visualize this? And my answer typically is another question. Which is, "What type of data do you have?" That's the first step that you need. You need to identify what kind of data you have, and this is called data abstraction.