I'd like to show you how to make a choropleth map in ArcMap, using different classification methods and setting the color scheme and working with normalization. So, let's get started. Here, I have some census track data for Toronto. I'm going to just right-click on the feature class, go to the properties and the symbology tab is already selected. Then, first thing you should notice is that, you can just have a single symbol, that's what I have now. So, in other words, even though I have a whole series of different polygons here, I'm using the same color scheme for all of them. So, I'm treating them all as though they are the same thing, they're one single symbol and I've just used a beige color scheme pretty simple. I could pick anything I want here, I can change the fill color to something else. Like, say okay. So, that would be just treating it as though it's all one single color scheme. Now, we go back to symbology. Though, let's do something a little more interesting than that. We go to quantities and graduated colors and that's what ArcGIS refers to as choropleth mapping. I mean, that's the cartographic term for it is a choropleth but in the software, they refer to it as graduated colors. So, you can, there's two different ways that you can create a choropleth map and the software. One is that you can take a values such as population, and then really for a choropleth you should normalize the data or standardize it in some way usually. So, in this case, we would normalize it using the area in square kilometers. So, what that's doing is that the software I've told it, I want a map population density essentially. But in order for it to do that, it's going to take the population field and the area field and divide population by area on the fly as we say. It will calculate those without me actually having to tell it to do the calculation, and then it will figure out what classes I should use, what the color scheme should be and then we'll put it on the map. So, if we do that, you'll see that by default, it's chosen five classes within natural breaks color scheme also known as Jenks. So, if we go into this, we can just explore this a little bit, there's different options available to us here. Let's just look at a few different things in this dialogue box, which is a few different things going on. Okay. So, up here is the method. So, we've got natural breaks, five classes, I'm not going to exclude any data, you could if you wanted to exclude something like an outlier like we have down here, over here are the break values. So, those numbers correspond to these numbers here, so 4340, 4340. So, that blue line there, this one here, is corresponding with that first entry in the class table, the second blue line is the second one in the class table, third blue line, third one table and so on. We can make these whatever we want. So, I could just move these around manually. So, I think I want one of the class boundary there, I think I want one there. So, you can drag the blue lines around. So, these are the class boundaries that we're going to assign to create the classes that we're interested in. I could also do them this way, I could just type in a value and say I want it to be 10,000 and then 15,000, whatever I want it to be. So, there's different ways that you can manually set these if you want. So, you'll notice that at the top here it is automatically switched to a manual method, because I started messing around with them manually. But if I go back here, you'll see that there's different methods available. So, there's equal interval, defined interval, quantile, natural breaks, geometrical interval, and standard deviation. I won't go through all of them, but I'll just give you a sense of how some of the most common ones work. So, equal interval, for example, as the name implies with five classes, it takes the range of values. So, whatever the bottom value is and the top value is, it takes that entire range and then because I've said five classes, it's going to divide that up into five equal intervals. So, there are all of equal size here. Okay? If I said four intervals, we divided up into four classes and so on. You could pick whatever you want, there's no magic number. Five is a common one but it could be 34567, anything less than three is usually not very useful, anything more than say seven is gets harder for people to interpret. So, I'll just say you would never use it but usually something between the range of five or three and seven something like that works well. So, let's have a look at what's going on here. So, one thing you notice is you see these gray bars here, and then there's just a few of them here at the bottom. This is a histogram of the population density values that have been calculated for all the census tracts in my dataset for Toronto. This is not meant to be used to show in a report, it's not like a really detailed histogram. It's really just meant for you to kind of visualize the overall distribution of the dataset. It does that well, but the main thing you notice with this one, is that almost one, the vast majority of all of the values are all in this one class. What that's going to do then is, remember with a choropleth map, we're assigning a color or hue or a gradient of some kind to each of the classes. So, all of these are going to end up with the same color in our map. Then we'll have a different color for this one and this one and this one. So, you'll notice there's practically nothing in this one up here, there's just this sort of this outlier. So, let's have a look at how this will end up looking on our map. So, if I just say okay and I'm going with this default kind of purple color scheme, say okay. So, the resulting map that you get ends up having these large areas that all look the same. Because all of these census tracks have values that ended up in the same class, they're all lumped together in that histogram. When that happens, they all end up with the same color and so your map ends up looking kind of monotonous. I encourage you to always look at the geographic distribution and the statistical distribution of your data. So, look at the histogram, what does that look like? Then, when you apply a particular classification scheme to your map, what does that look like? I encourage you to always think about, are you able to see or show similarities, differences, relationships, clusters, gradients? Goes from high to low or something like that. If your map looks like this one, where everything looks kind of the same, that's really what your map is communicating. I dealt that's what you probably want to get across. You wanted to have a little more variety and have something that is more interesting and useful for the map reader. Okay. So, let's go back to the drawing board, will open our properties again, go back to our symbology and try a different classification scheme. Instead of equal interval, we can try quantiles. Now with quantiles, it looks at the total number of values, and then sorts them from lowest to highest, and then says okay, if I have, let's say I had 100 values with five quantiles, that means that there'll be 20 values in the lowest quintile. Another 20 values in the next highest quantile and so on. So, in this case you you're emphasizing relative differences instead of using something like equal interval. You'll notice that when we do that, we end up with different sized class boundaries here. So, we've got one really big class here, because we've got this couple of outliers if you want to count that one. So, we have a really big class here, because that has the same number of values as this next one, which has more values that are similar to each other. So, they end up being closer together in the histogram, and so on with the rest of them. So, if we do that, and we say," Okay, what does this look like on our map? If we go with this," same data, but a dramatically different pattern being shown with this classification scheme. So, this I would argue is a more useful map in order to be able to show population density across the City of Toronto. We can go back and try a natural breaks method which is the default. The reason it's the default, is that, it works pretty well most of the time for most data sets. The algorithm that's used is a little complicated, but essentially what it's doing is it's trying to find, breaks in the data set, there are clumps of data within the histogram, that are similar to each other, other ones that are different. So, with this histogram here, I wouldn't say it's blindingly obvious, but I suppose what it's finding here is that there's a break there, natural break, there's a break there, so there's a clump of data there, there's a bunch of values that are similar there, there's a clump of data here. It's maybe not the most intuitive thing, because sometimes there'll be a really nice whatever bimodal distribution you say, "It's really obvious." That's the break there, and it will put it into two classes, but often it's not as obvious. But, if you give it a try, often it will work well. You may have to experiment with the number of classes a little bit, but let's just see what happens if we go with, the natural breaks five classes method here. So, then we get a different pattern again. It's hard to say which one actually works best here, I have to say I think the quantile looked pretty good, but you'll have to work without an experiment with it. But the idea is that, think about it, experiment, look at what your map is showing, is this what I wanted to show?, will somebody interpret this the way I intended it for them to interpret?, Will the will the data speak well? In other words we'll be able to see what the data's really got in it as opposed to hiding something, not to say that you will be doing that intentionally, but sometimes you can mask trends, or patterns, or clusters unintentionally just by picking the wrong classification scheme. So, like I said you really have to experiment with this a little bit and see what's going to work best. One other thing I wanted to point out, and this is something that people get confused by sometimes is that, you don't always have to use this normalization drop-down. If you have already calculated something like a density value, if you've already normalized the data inside the attribute table, then you don't need to use the normalization drop-down. So, if I go up to here you'll notice that I actually do have a population density field already, and if that's the case, then I do not need to normalize again because I've already done that calculation, I've already in that field divided population by area. So, you don't have to use it again. Sometimes people feel like, "Oh I have to normalize it because it's a choropleth," only if you haven't done that in the calculation already inside the field. So, if I do that and apply this, you'll notice that the map looks identical. So, it doesn't matter whether you get it to do it on the fly using the normalization dropped down, or whether you calculate it yourself using the field calculator. You should end up with the same values, and if you're using the same classification scheme, your map will end up looking the same. Really what it comes down to, is that column that you want to map such as population density, is there something that is useful to you to have stored in your data set. So, you go in and add a column, do the field calculation, and calculate density, or is it something that you're happy to just have it do on the fly when you need it. There's really no difference, it's completely up to you as a user as to what's convenient to you, and when you might want to use one versus the other, just as long as you're doing the calculation correctly. I will point out that there are different color schemes that are available. Regardless of which classification method you use, and how many classes you have, you'll end up probably wanting to use a color ramp, like we have here. Okay? The idea is that you want to make it clear to the map reader that you have low values with a gradient, or gradation going to high values. That there's a trend there, assuming that the data is appropriate for choropleth like we have here. So, we have this light purple with low values, and a dark purple for high values. There are lots of different color ramps available, some of them are more useful than others for choropleth mapping. A common pet peeve of cartographers is that, a lot of people like to use something like a rainbow method. So, you end up with this really bright color scheme, and for one, just even if you won't ever use this is that, I would probably reverse it so that the cool color would be at the bottom, and the warm color would be at the top, and just a little tip there, as you can right-click, and say flip symbols. So, that just takes the color ramp and flips it around. Let's just have a look at this. So, this is not a color scheme that you really want to use. There's all kinds of wrong with this, there's too much contrast, there's too many bright colors. It's too hard for a map reader to figure out what it is they're looking at. There's all things related to cognitive perception as to how your brain processes these colors, and what it interprets as being a higher value versus a lower. So, even though, years ago, these were quite popular, I admit I used to use them a long time ago when I was starting out. Because it's really obvious what color, the colors are different from one another, it's not something that I would recommend for this data. The only time, and even then I would probably go with a lighter version of this, and flip it again. If you were actually showing temperature values from cool to warm, then that would be, let's say these were different temperatures across Toronto, something to do with the urban heat island, or something like that, then it would be useful and appropriate to do this, but not for your typical choropleth map. What is recommended, is to use one color, and it could be from low, it can be a gray scale, so again we would flip this, so if we do that, let's have a look at how that looks. Yes, that looks good. So, the darkest areas have the highest values, the lightest areas are the lowest values. So, even something like a gray scale can work well especially if you want to put something else on top of it, let's say a road network, or a transit line, something like that. So, there's lots of different ones you can use. I won't go through all of them of course, but just make sure that you are using ones that are one color scheme like that. Do not use these ones that we would refer to as a diverging color scheme. Those are useful if you're trying to show values that are above, or below an average, something like that, are above and below zero, or whatever. But for a choropleth it's going from a low value to high value, you don't want to use a color scheme that implies that, there's some middle neutral value. So, I recommend sticking to ones like these, for example, red is a common one, a very popular combination there. So, it may not look quite as exciting, but the idea is that, it's very simple and obvious for people to be able to interpret.