When we're mapping quantitative data, we often want to be able to classify that data or group it into classes in order to be able to map it more effectively. So this is true for choropleth maps, you could use it for other things like graduated symbol maps. So let's have a look at how we can clump those data values together into these classes in different ways and how that works, and why we may or may not want to use them for various applications. When we want make something like a chloropleth map, you'll see here that we have one value for each of these areas. And we want to group them together into classes in order to be able to create our gradation of colors that will result in our chloropleth map. So what we have here are data values that are grouped into classes. That's what we call them... And we have different methods available that can be used to group those things together. And the method that we use and the characteristics of that data set will have a big effect on how our map looks. Here we have some census tracts for the city of Toronto. And let's have a look at what happens to the way we map the data and the way it looks based on the classification method that we choose. So for example here, I'm looking at income data. This is median household income data for each census tract. And if we use an equal interval methods and I will explain these more a little bit as we go along, but the idea here is that the class boundaries are equally spaced apart. Okay? And so there are equal intervals. And what we're seeing here and just to make sure this is all clear is that this is a histogram of the census tract values. And we've assigned our class boundaries, so these are the class boundaries or they're known as break values. So these for example 22716 is 22716 over there okay? So I want you to kind of get used to looking at this dialogue box, because you'll probably be using it a lot. And so the next one here is there, and so on, and so these are all of the census tract values. These are the class boundaries. And what this is trying to show you is, that, you know, what's the distribution of the data values. And how will that relate to how many of those values will end up in each class. So, with the equal interval method, if we look at our income values, this is the map that we get here at the top. Okay, so that's one pattern that we can get based on using that particular data classification method. If we take exactly the same data, but use a different data classification method, in this case quantiles. We get a very different looking map. And so here, same thing again. We have our class boundaries as these blue lines, but notice that they're differently spaced now. Very quickly, quantiles are based on the idea that you have an equal number of numbers in each class. And I'll get into more of these as we go along, but the main thing I want you to see now is that even though we have five classes in both methods, because they're different methods and the way that they go about grouping those data values together. When you then assign these colors to each class, you get a very different looking map here than you do here with, like I said, exactly the same data. We can use a third method known as natural breaks, which looks for kind of clumps of data or areas that there are breaks between clumps of data. And so, here we have these blue lines, that are spaced based on a rather sophisticated or complex algorithm that's looking for the distribution, it's customized for each distribution of data specifically. And then we end up with the difference pattern of colors in our data set again. So if we compare equal intervals to quantiles and natural breaks, the main thing I want you to get across to get right now is that these are all the same data, but the choice of data classification method will tell a different story depending on what it is that you are looking at. If we do the same thing again, only this time we're looking at population density, that becomes even more dramatic. So I've used equal intervals in the left, quantiles in the middle, and natural breaks on the right. And look what happens here is that because a lot of the values for population density are low, a lot of the values have been clumped together in this bottom category, or this bottom class. And what that means is, we have very few values in this class, or this one, or this one, or this one. And so, when that happens, you end up with a lot of values that are in the same class. With a chloropleth, that means you're going to be shown with the same color and so here most of the map is all the same color. And that's not ideal. When you see something like that when you're creating chloropleth map, you should think about, well, is this really telling me or telling my map reader a useful story? Remember, a big thing about maps is it's a form of communication. We're trying to show people similarities, differences, and relationships. And what you're telling somebody by looking at this is that the population density across Toronto is basically all the same. And that's really not accurate, it's not a very good way of representing the data. And so when you see something like this, you probably think, maybe this isn't the best method. If we look at quantiles, we've taken the number of numbers. So basically what you're doing is you're sorting the values from lowest to highest, you're picking the number of quantiles. In this case, we have five. Next segment, we're going to have the equal number of values in each of those quantiles ranked from lowest to highest. And so, we end up with a very different-looking map here than we would with equal intervals. And then the actual breaks, it's looking for clumping and breaks in the data, and we get something different again. And so, for example, if you were I don't know, if you were like a developer who's proposing to put up a huge condominium somewhere or something like that, you could show a map, like the one on the left, and say to city council or whoever's you're trying to get to approve this, say this the city is practically empty. There's nobody living there. The population density is so low all across the city. There's tons of room for growth here. Why don't you let me have my condominium? And then somebody else could come along, if they were against that being built for some reason and say no, no, no, look, there's tons of population density, especially downtown. Too many people already, we don't need any condominiums. Or you could look at this map, and have kind of a balance between the two and say well actually there's some areas that are higher, there's some areas that are lower, maybe we need a more nuanced approach. We could sort of look at where density may be too high already or where there's room for growth. And so, again, the whole idea here is that there's no one right correct answer all the time for all maps. I can't just tell you you should always use this data classification method or you should always use this number of classes. Some of it is subjective. And one thing that's really important is to look at distributions. And I mean that in two different ways. You want to be able to see the distribution of the data statistically in terms of the histogram, so there. And you also want to look at the distribution geographically on your map. Because the two are completely or inextricably linked, is that depending on the class boundaries that you use here, in relation your histogram. That's going to have a huge effect on how your map looks. And really, all it can do is tell you to look at things like, you know, is your data normally distributed? Do you have outliers? Is there something that's out of whack? Is there too many numbers in one class? Does your map look too similar? Are you not seeing distributions well? All these things are things you'll kind of build experience with and become more fluent in, in terms of making those maps better, in terms of what it is that you're trying to represent. When you're in the software and you're trying to make a choropleth map, or you're trying to divide up your data into classes, there's literally just a drop down box here that will list for you the different classification methods that are available. So there's Manual, Equal Interval, Defined Interval, Quantile, and Natural Breaks and so on. So I might go through every single one of those, but I want to kind of give you an idea of some of the most commonly used ones, how they work, when you might want to use them, kind of pros and cons and things to think about when you're making a quantitative map. If you want to do a little more reading on this or get a little more explanation about it, I would recommend going to the documentation that Esri has online for ARHIS. So this is a page called classifying numerical fields for graduated symbology. And it essentially walks through what I'm going to say here. I've actually borrowed some of my figures and explanations from this page. I'm making that quite clear. I think they've done a good job here. And I think it's a useful way sometimes. You may be listening to me saying this, and it might make sense at the time, but sometimes you need to actually go read something. Words on a page, so to speak, for you to absorb it a little bit better or to kind of think about it in a different way. So, I'd have a look at that, it might be helpful to you.