Choropleth maps are one of the most popular and commonly used map types out there. So, let's have a look at how they work. So, the word Choropleth was coined by a cartographer named John Kirtland Wright in 1938. He was trying to come up with a word to describe a combination of assigning values to different parts of a map or different spaces. So, he went to the Greek origins of this, which are choros for space and pleth for value. So, he combined those to create this new word called choropleth. So, that was a word that this guy invented. So, why am I telling you this? Why have I not mentioned the origin of words in a lot of different sections before in other videos? Well, because there's a common thing that people really want to do, which is to put an 'l' in there and call it a chloropleth map. It's just one of those little things that kind of bugs me, is that it's not a chloropleth map, it's not related to chlorophyll or chloroform or chloro this or chloro that, it's choropleth. I always think it's important especially when you're starting out and learning about these things to get the terminology correct from the beginning, that way, you look like you know what you're talking about and you're not making these kind of weird mistakes that to any trained GIS cartography type person, they'll notice that right away is if you mispronounce it as chloropleth. I think I've made my point. I just wanted to make sure that was clear and so, you won't make that mistake. You'll probably think of this and go, "Yeah, I'll never say that now," which is mission accomplished then. Okay. So, the whole idea of a choropleth map is that you have numbers that are assigned to areas. They could be anything, but probably the best place to start or one of the most common ways of using them or for things that we would call enumeration areas. So, think of it, like if you've counted the number of people for a census unit, something like that in a neighborhood or a word or a congressional district or whatever it happens to be, you've got a number that you've assigned to a particular area. So, here we've got some population counts for different census tracts, and so we have one number per area. Now, on its own, if you're just looking at that, if I quickly asked you to say or I asked you what's the highest value? What's the lowest? Is there a pattern going on here? Is there a gradation from low to high from east to west? Something like that. If you're just looking at the numbers, it's really not that easy to see what's going on. So, what we do is we tend to, this is normally how it would be done, is grouped those values together into classes and then assign each of those classes either a gradation from black to white, like a gray scale or gradation of some kind of color, like I've done here. So now, we have five different classes of values and if we assign those to the numbers in our dataset, we can assign it, as I'm saying here, an intensity of color or shade that's proportional to those values, then you end up with a choropleth map, where it's much easier to visualize those actual numbers. I could put labels on there as well if you wanted to, but the idea is generally, that do you want someone to be able to look at that and very easily be able to see which areas have higher population values and which ones have lower values. Now, it's not super great to just use population counts, it's better to use something like density, but I'll get to that a little bit later. So, if I apply exactly the same idea to my entire data set here. So, these are all the census tracts for Toronto, here are all the population counts for each of the census tracts and that's exactly the same thing if I said, "So, what's going on here, where are the high areas, low areas, is there some kind of pattern going on, can we see similarities and differences and relationships, things like that." Of course, it's really difficult to do that, but if we classify the data, in this case we can do this in ArcMap with the Symbology tab here and I've specified the value as being population. I'm going to use five classes. So, I've divided up into five, doesn't have to be five. It's not some magic number, it could be three or seven. I'm using a classification method here called quintile. We'll explain that a little bit more later and so now, I'm using a color ramp here from this lights, what would you call it? Kind of a magenta to darker and so that is being used to assign a range of that color scheme according to the color ramp to these classes. So, that's how this idea of a choropleth map is actually implemented in the software, that's exactly how you would set it up if you are going in and doing that yourself, you've got a data set, you tell it which attribute to use, in other words, which column in your table, and then you tell how many classes you want, you tell it how to divide up the numbers into those classes, there's different ways to do that and then you tell it what color scheme to use. Boom, it puts it all together and you end up with a choropleth map like this. So, now just like I did before, I've got my different classes, I've got a gradational values that indicate to somebody low to high. So, someone looks at this, it's very easy for them to be able to look at any part of the map and see what areas are lower population versus higher population. Now, I have a question for you, is this a useful map? I want you to think about it for a second and this is a common thing with choropleth maps. It's something that it's important that I think anybody who is teaching this wants to make sure it comes across well is that, look at the size of the areas that are being mapped and what's being mapped here. So, we're mapping people that live in a city and we're using different sized areas to count up how many people are in those areas. So, it makes sense or I'm hoping that you're seeing that if you have a big area, the odds are pretty good that there's going to be more people in that area, and so they'll be a higher value, and if you have a smaller area, there's less people, the odds are that it's likely that there's less people at that location, and so you're going to have a lower value. So, in other words it's not the most useful way of portraying what's going on here because it's bias by area. In other words, we want to be able to control for that area or take it out or normalize it or somehow count for that, so that when we're making a map, we can show something that's more true to what's really going on, in this case what would be a better way of doing this is taking out area or normalizing for it and the way we do that is to divide the populations by area to create a population density. If I go back to my symbology, I have the option of using this thing here called normalization and you can select what field you want to use for that. So, what's happening here is I've got population, let me just try that again. I'm going to go, population and I'm going to normalize that by area. All that means is, that I'm asking the software to take my population column divided by my area column and that's going to calculate on the fly, so to speak, what the density, the population density values are and use those in my choropleth maps. So, I'm still using the same color scheme, I'm still using five classes, I'm still using quantiles. But now, I'm going to be representing population density. I'm taking out that bias that's being introduced by area and see what happens in terms of my result. So, here's what we get. We have a map that has a very different look to it than the population map. This is something that to me makes a lot more sense or it's more useful is that, this is downtown Toronto here, so yes, the population density is much higher. Look at over here we have a much larger census track, this is probably one of the biggest ones in the city, but it turns out that there's really not that many people living there if you account for area. So, it actually has a fairly low population density and that's probably more useful in a lot of situations when you're trying to interpret things that might be related to government policy or what politicians might want to use in terms of making decisions. Often, the density will be more useful than the count. Not always, but it's definitely something that you want to take into consideration when you're making a choropleth map. Here's a comparison between the two. So, we have total population versus population density. So, just two different ways of thinking about a variable that you're trying to map using a choropleth and what's the most representative or useful way to do that. This example might help you understand how area can bias results or bias a choropleth map. I really like this example. It's from a book by [inaudible] it's a great cartography textbook. So, the example here is that if you have farmers fields that have been divided up into different sized areas and you wanted to make a map or choropleth of how much of those fields had been harvested. So, notice here that we have 16 acres here, 16 acres here and this is 64 acres there, and so we're measuring the areas in terms of acres. If we look at the total acres harvested, let's say we're harvesting corn, so here we have no acres harvested, there was no corn harvested there, we have 16 acres harvested here and 64 acres harvested here so all it is, is that these had been divided up into different sized fields. If we make a choropleth map based on those raw counts as opposed to accounting for area of normalizing, then this is the choropleth map that we would get. So, if we just interpreted this, we see a light green so that would mean that there was low or no core harvested, we have a medium green so this would be a medium amount harvested and this dark green would be a high amount harvested. So, that's the way that someone would interpret that choropleth map. But if you actually divided by area, this is if you added this up 4 times 16 is 64, so if you actually look at the same size area, the same amount of corn was being harvested but you've got two different colors here and it's almost well it is misleading or almost lying to somebody is that they'll look at these two different colors and say there was less corn harvested here and more corn harvested there based on these counts which is not accurate. It's misleading and it's not a good way of representing your data. However, if we divide by area, then you can see here that these are now the same color because we've normalized for area, we've divided by total acres there and now we've got something that's more representative when somebody looks at that they say oh these are the same color that means there's the same amount harvested and that's what we want them to see is something that's true and more representative of the data. If we apply exactly the same idea to the census tract data, let's see what happens there. So, what I've done here is I have isolated two census tracks that are quite different in size and so if we look at the population counts, you'll see that this really big census track has 12,909 people in it and this smaller census tract has 13,530 people in it. So, the population counts are within five percent of each other and so you think okay yes. So, if we look at that in terms of a choropleth though, as a population choropleth these would be the same color because they would be very similar values they'd be the same color of red. But if we look at the population densities, the density of the larger one is 167 people per square kilometer. The density of the smaller one is 3,232 people per square kilometer so way higher density. So, the density of one is 19 times higher than the other and so which do you think makes more sense in terms of trying to compare things, I think you probably see that it makes more sense usually to normalize or standardize your data if there's some bias taking place with choropleth maps and geography areas the most common way that you'd want to do that or that you'd want to account for. Just to summarize, we can look at total versus derived values. So, total values are things like population counts which are not normally used for choropleths unless you have a good reason to use them it's not to say that it's absolutely forbidden or the software won't let you it's nothing like that it's just that you have to be conscious of these things and make that decision intentionally. So, you wouldn't normally use it for things like population percentage tracked as I've just shown you, what we do prefer to use for choropleths are things that are derived values. So, those are ratios involving area like we were just doing like normalizing. So, population density per census tract or ratios that are independent of area things like per capita income for a census tracks. So, those are things that are not biased by area and so that's perfectly fine. As an example of that, I just made this map for fun. It's a ratio of males to females for different parts of the city these are different neighborhoods and so I just put the labels on here if you're familiar with Toronto or if you're not and I've used this diverging color schemes so where the ratio is almost one or very close to one, in other words there's equal numbers of males and females, we've got this gray and then I've used a diverging color scheme to show increasing amounts of these ratios either higher or lower than one. So, as you can see here, if you're looking for a lady downtown is the place to be, if you're looking for a man then I'd say get to West Humbler-Clairville or Wexford/Maryvale, I'm just joking around here but the idea is that actually it's interesting to see that there is variation in the ratio over the city. I have no idea why that is but this is a way of showing data that's not biased by area because all we're doing is dividing one by the other it's a ratio of one thing to another and the size of the census tract will have no effect on how many men versus women there would be in a particular location. So, appropriate data for chloroplasts are things like statistical or political boundaries where people have drawn these boundaries to count things like people. So, this might be population per census divisions something like that. It's usually not used for continuous data. So, the reason for this is that the distribution of data is not related to the boundaries that are being used to count things. So, for example you could do this but it really wouldn't make much sense to to make a map of average rainfall per census division something like that because there's no connection or relationship between those two variables. Census divisions were designed to count people for a census, that has nothing to do with rainfall and the amount of rainfall that falls at a location has nothing to do with these boundaries. So, yes if you had boundaries like watersheds or something like that that would make sense to you could do a choropleth for that but again that's an exception it's not the most common way of doing things. So, let's just have a look at, take this to an extreme just to see how this happens or how this works. If you have Toronto neighborhoods, so I've got a map here of neighborhoods and here we have elevation data. So, this is an example of continuous data. If I made a choropleth map of elevation per neighbourhood. Yes I can do that it's certainly possible but is this meaningful is this really useful not really. I had a little fun with this, I turned it into an extruded prism map or 3D map just to show you conceptually visually how this is actually working, is if we took these elevation values for each neighborhood and we extruded them this is really what the choropleth is implying. Is that you have these perfectly flat areas that in terms of elevation and then as you get right to the edge of one of these boundaries you fall off a cliff and then you get to the next perfectly level area here and then you'd have to climb up this cliff to get to this next perfectly level area and so on. So, the of course that's ridiculous that's not the way elevation works. So, why would you use these boundaries to represent a variable that's not related to those boundaries? It's different for people. If I actually show it this way for people that's fine because these boundaries were designed for people, all it's really doing is saying there's this many people in this area and this many people in that area. There's nothing bad or unrelated about that it makes total sense so make a choropleth for that but don't make it for something that's not related to the boundaries that are being used. This is a better way of showing that data so this is continuous data with the neighborhoods draped on top of it. This shows you this is quite exaggerated Toronto is not quite nearly as dramatic in terms of the terrain but you get the idea that you've got a lot of variation in any particular neighborhood and it's not really going to be representative to show that as one flat choropleth unit.