With an equal interval data classification, we're dividing up the data into equal ranges. So, if you key into these blue lines here, those are the class boundaries for this dataset. So, these are equally spaced apart in terms of the value difference from the top to the bottom. So, really these aren't exactly round numbers here, but if you had values, one to 20, 21 to 40, 41 to 60, that kind of thing, you have a difference of 20 for each one of them. That is an equal interval. So, you might think, Well, then why don't we just use equal interval of time because it sounds pretty straightforward. It must be pretty good. It's easy as I say here for a non-technical audience to interpret. If you're showing a map to people that aren't really that fluent in math or that particular dataset, you want to make sure it's really easy for anyone to comprehend, then, you would think that an equal interval would work well and all else being equal. It's a good way to start, to see well if I could do it this way, it gives you a nice simple legend, nice class boundaries that are easy to interpret. Great. The problem with it though, is as I've mentioned, is if you have a clustering of values like for example here, let me just clear that. Here we have this distribution of values. Notice that, the vast majority of values are in the first two classes, and that could end up biasing the interpretation of your map. Is that, you will make it look as though all of the values are very similar to one another, through either in class one or class two. So, this is a version of the data up here. That I'm showing here is that you're going to end up this is when equal interval with something that could work but often doesn't work. So, you could use it but it's not necessarily always going to work. With quantile data classification, we have an equal number of features in each class. So, if we had say, 100 data values, we sort them from lowest to highest. Then we would say if we had five quantiles for example, we would have the lowest 20 values in the first class, the next 20 values in the next class, and so on. So this emphasizes relative position. It's often used for things like income. So, who's in the bottom quantile or quartiles, something like that, who's in the top? Who's in the middle? Features with similar values may be in different classes., which can exaggerate differences. So that's something that can happen with different data classification methods, but this is one in particular that I think it stands out. Is if you have a value that say, very close to another value, but it just so happens that one gets grouped into one quantile and the other one gets grouped into another quantile, they'll end up with very different or they'll have different colors, even though the values themselves are very similar, it just so happened that one was in the bottom 20 values and the next one is in the 20 to 40 values or whatever it is. So, it's implying that there's a difference there, when really there isn't. So, it's not necessarily something you can do a lot about, but it's just something that you should keep in mind when you're making these are sort of know that that can happen, and as I say, here the reverse can also happen. So, you can have a wide range of values in the same class minimizing difference. That's what you see here with this class, is there's actually quite a range of values from here to here, but if those are the top 20 values out of like I said it of a 100, or whatever, it is you're working with, then they all get grouped together. It doesn't matter if one of those values is way higher than all the rest of them. It still ends up in that same quantile. So, it implies that those values are all the same, they're going to end up being the same column. Remember this is all just about how it's going to look on a map, and it's implying that those are the same when maybe they're not. It can get confusing with quantiles because depending on how many quantiles you have, they may get called different things. So, for example, if you just have two quantiles, really all that is, is that you've divided them based on the median. Median just means that you have half of the values of the lower part, below the median, the other half are above, and so we will just call those two quantiles. Really, if you have three quantiles, they're known as tertiles, which I don't really hear that often, but maybe quartiles you hear a lot. So, that just means, if there's four quantiles, that's the same thing as a quartile. If you have five, it's quintiles. If you have ten, it's decintile. If you have 100,it's percentiles. It's all exactly the same idea. You're just taking all of the values you have, sorting them from lowest to highest, and then dividing them up based on the number of classes you have, so could be four quartiles, five quintiles and so on. So, you may hear me or other people refer to things like quartiles or percentiles, so just refer to this, if you need to and make sure that is clear, because sometimes the terminology can be a little bit confusing. Natural breaks is an interesting data classification method, because it's customized to the distribution of each dataset that you're working with. It's based on the natural groupings of the data values themselves. Now, I'm actually not going to explain how this works because it's a bit complicated, and it's a little beyond, I think the level or scope of what we're doing here, but it's suffice to say that, essentially what it's trying to do is say, well is there a clip of values at the bottom and another clip of values in the middle and you've clip at the top with breaks in between them. Ideally, the software kind of goes through and looks for those and says, all of these values are all clustered together, we'll put them in one class, these ones are clustered together, they'll go in another class and so on. It works so well that it's actually the default data classification method in ArcMap. So, it's a good one, it's kind of a go to, as I always say though don't ever just go with the default and hope for the best, you should experiment and think about, it and make a conscious decision. But if you do use natural breaks, it's a good one here as I say, because if the values are not evenly distributed in your histogram, it's a way of kind of I guess, enhancing or maximizing the differences of the distribution of that dataset. As I say, it can be difficult to choose the optimum number of classes especially if the data are evenly distributed. So, the downside of it is that, if you have something like this, there really aren't any natural breaks that I'm seeing. I mean it's just a pretty normal distribution. But if I've told the software that I want to use five classes with natural breaks, it will do its best to find five breaks in that data, even if they aren't really there. Having said that, it still will work remarkably well most of the time. I use it a lot. It's a good one. The only big downside I suppose is that, when you're showing this map to other people, the class boundaries are not really going to be that intuitive, because they're not equal interval, they're not quantiles, which both of those I think people are used to seeing. With natural breaks, there's no kind of rhyme or reason to where those breaks actually are from a map reader's point of view. It's not like, you're going to say, every time someone looks at the out map, or by the way here's the histogram, and this is why I did it that way. They're just going to see your map with the classes on it. So, you have to balance that out as that, in the end if you end up with a distribution of values in columns on your map that make the most sense, and tell the best story and the most accurate representation of that data. Then, it's worth it, but if if you could use something else that maybe makes the legend a little bit easier to read and still gives you good distribution of values in terms of your map, then you may want to consider something else, so not to try to downplay, but that's just some of the things you might want to consider when using natural breaks. The standard deviation data classification method is not the same as the other ones in that, it's not grouping the data values themselves into classes. What it's doing is, looking at how far a particular data value is from the mean or the average for that distribution of data, and then assigning it to a class based on that. So, let's have a look at how this works. So, here we have the mean for this dataset. Okay? You'll notice that this data is slightly skewed. That's okay. Here we have the lines that indicate the standard deviations away from the mean, and if you're not statistically inclined or you don't have a background in this, that's okay. All you really need to know is that, a standard deviation is a number that indicates how far a value is from the mean. So, anything that's within this distance is within one standard deviation of the average. So here's our average here. If we had a number line here, we'd say it's within one standard deviation of the mean. Great. If it's over here, it's within two standard deviations of the mean. That would tell somebody that it's farther from the average. Here would be three standard deviations from the mean that's even farther. So, you're getting values that I guess if you want to think of as more extreme or farther away from the average, that's all it really means. Pardon the pun. Now, where this can get slightly confusing is that the way the software works, is that, it doesn't use the standard deviation lines themselves as the class boundaries. You'd think it would, but it actually makes more sense not to. What's happening as it takes that distance, which is the distance that is one standard deviation and it uses it to straddle the mean because really what we're interested in is if we want to know how close to the mean values or if they're just above the mean or just below the mean, really they should be in the same class or the same category, would be the same column. So, that's what they've done as they've said, okay well, this is one standard deviation, we'll have that sort of half above the mean, half below the mean. Then, anything that's in that range will be assigned as something that's around the average, and then we have ones that are farther away and so on. So, we can take a color scheme, and say we will use a diverging color scheme, where we have something neutral for what's right around the average. Then, we can have slightly darker like a green and a red for the next one over, the next one over, the next one over and so on. So, what this is doing is, when we apply this to a map, if you look at a legend, we have this nice diverging color schemes. So, these are values that are right around the mean, these are ones that are increasing distance away above the mean, and these are increasing distance below the mean. So, this is the mean here, this is increasing distance above the mean, decreasing distance below the mean. That's all it is. So, that's a way of being able to show this on a map, and let me show that to you now. So, here's a little thumbnail of it. Here's a better version of it. So, what this tells us for mean income in standard deviations is, this is not the income itself, it's how close they are to the mean. So, the beige areas on the map, so you can see them here or here and so on. Those are ones that are right around average. The ones that are dark green are far above the mean, and the ones that are red are far below the mean, so here and here. Interestingly, just based on this, I kind of put this together to show you, is that a lot of the areas based on this color scheme and this data classification are remarkably close, well, they're within one standard deviation. That means, we have a lot of areas that are kind of light green, a lot of areas that are light pink or red, and so that tells us that there were parts of fairly good chunk of the city, that's not that far from the mean. Then, if you were say, a politician or someone interested in policy, you might be interested in areas that are at the extremes, either areas that are particularly wealthy or particularly in need, and that might have an influence on the decisions that you make for the policies that you're crafting. So, here's the same diagram I showed you a minute ago, just to kind of make sure that this is all clear. So, this is the dark green here, that ends up there, and that's what it looks like on the map. So that's that there. So, these are the areas that are right around the mean, so that's a beige area there and so on. You get the idea. So, often what will happen is that, you'll want to have a map of the actual values using a different data classification methods. So, might be equal interval quantiles or whatever, and then you might show that the standard deviation map as well to complement or supplement it, so that someone can see both of those things, and get a sense of the trends. Is like it's one thing to kind of look at the values, and say, this is high or low, whatever, but it's another to say this is how close they are to the average or how far away they are from the average. So, this is just a summary of what I was saying with a couple of points in here. So, standard deviation is the amount that the value of feature varies from the mean. It's good for seeing which features are above or below average. The data should have a fairly normal distribution. The map is not showing actual values, only how far those values are from the mean and outliers can skew the mean, because as with lots of things related to averages and so on. If you have one value that's way over here, that will actually drag the mean over. So, that will have an influence on the way the rest of your map looks. As a comparison, if we were to look at median, income versus the standard deviation for that, so this would be a way to be able to tell somebody, okay here's a pattern of the actual median incomes. We have wealthy areas here, less well-off areas over here, in terms of the actual income values, for the standard deviation also shows us something. So, these areas are far above average, and this area here on this map, if you look over here, it is below average, but it's within what would that be 1.5 standard deviations of the mean. So, that gives us just a little more context in terms of how we might want to interpret that dataset. So, here are the four different classification methods we've talked about. Three of them are using the same dataset. I just want to emphasize one more time that it's really important that you pay attention to the geographic and statistical distribution of your data, and how that relates to your choice of the method that you use, the number of classes that you use, and make sure that the resulting map is representative of what it is that you're trying to say about that set of data and the distribution and so on, and make sure that that's true to the story that you're trying to tell.