The first string-sorting algorithm that we're going to look at, is actually the basis for several more complicated algorithms. It's called key-indexed counting, and it's very useful in a particular, special situation. Let's take a quick review of where we left off with sorting. So we considered a number of sorting algorithms, starting with insertion sort, and then mergesort, quicksort and heapsort. And we got to the point where we could find an algorithm, that's heapsort, that guarantees to sort n items in time proportional to N lg N, without even using any extra space. Unfortunately, not stable. And all these algorithms are useful, for any type of generic key, as long as it implements the compareTo operation. And not only that, we proved that any algorithm that just uses compares, has to use number compares proportional to N lg N. So in a very important sense, mergesort or heapsort, for example, are optimal. You can't use asymptotically fewer compares for either one, and with heapsort you can't use less extra space. So why do we consider other sorting algorithms? There's a lower bound, why are we thinking about this? And the question is, can we do better? And obviously, we're here because the answer is that we can do better, if we don't depend on compares. The one assumption made by the lower bound is that we use compares, but we don't always need to use compares. And so, let's look at an example. Key-indexed counting is a fine example of that, and it's representative of a fairly common situation in a sorting application. Where it happens to be, that the keys that we're using to sort are small integers. So in this case, this is supposed to mimic an application where there's students, and they're assigned to sections, there's not too many sections, and we want to get the thing sorted. So we want to distribute the students by section, and so we want to sort according to the section number, and that's a small integer. And the implication of knowing that the key is a small integer, is that we can use the key as an array index. Then, by knowing that the key is an array index, we can arrange for a fast sort. So lots of applications for that, when maybe you have phone numbers that you can sort by area code. Or if you have a string you just want to sort by the first letter, you could do it that way. And actually, that idea leads to an efficient sorting algorithm, in actually two different ways. Now don't forget that we're sorting according to a sort key, but usually we're sorting bigger generic items that have other information associated with them. If you were just sorting the small integers, you could just count how many 1's there are, how many 2's there are, and like that. And then in one path, and then if there's three 1's, then just output three 1's, and so forth. But the complication is that we have to carry the associated information along, so we have to work a bit harder than that. So here's the code for this method called KeyIndexedCounting, and let's look at a demo. So here's the KeyIndexedCounting demo. Now to make this is a little less confusing, and not so many numbers, we're going to use lowercase a for 0, b for 1, c for 2, and like that. So it's the a minus first letter of the alphabet, however you want to think of it, and we're only going to look at 6. So we're supposing that we're sorting this array that has six different small integers. And we're using lowercase letters to represent the integers, so that we can easily distinguish between the keys and the indices. So now let's look at how the processing for this. So the first thing that we do, is we go through and we count the frequency of occurrence of each letter. So the way that we do that, is we keep an array. Now, the array's actually gotta be one bigger than the number of different keys that we have, the number of different small integers that we have. So in this case, array of size seven. And just to make the code a little cleaner, we keep the number of a's in count of 1, The number of b's in count of 2, and so forth. And so once we set up, that's what we want to do, then it's trivial to go ahead and count the frequencies. We'd simply go through, for i from 0 to n, we'd go through our input. And when we access a value in our input, it's a small integer, so it's 0, 1, 2, 3, 4, or 5. And we simply add 1 to that integer, and use it to index into the array. So when we see an a that's 0, then we're incrementing count of 1, when we see a b that's 1, we're incrementing count of 2, and so forth. So in this case we increment count corresponding to d, and then a, and c like that. So every time we encounter a new key, we just simply increment one of these counters. In one pass-through, we get an array that gives us the number of a's, b's, c's, d's, e's, and f's. That's the first pas of key-indexed counting, count the frequencies of each letter, using the key as an index. Now the next step is called computing cumulates, and that's a really easy thing as well. All we do is we go through the count array, and simply, at each step, we add the current one to the sum computed so far. So if we look before, we had two a's and three b's. So that means that there's five letters less than c, that's the a's and the b's. And there's six letters less than b, and eight letters less than e, and so forth. And that's just obtained by, we start with 2, add a 3 to it, get 5, add 1 to it to get 6. And with that one pass through the count array, then we can find out, for example, there's six keys less than b, and eight keys less than e. And those cumulates tell us where the d's go in the output. There's 6 keys less than d, and 8 keys less than e, so the d's have to go in a and a. So this is an array of indices that is going to tell us how to distribute the keys in the output. So that's the next step, access the cumulates, using the key as index to move items. So let's take a look at, so now remember when we see an a, we're just going to count that as 0. So we're going to go to count, and that will access this entry in the count array. So we go through the whole array to be sorted, and we move each key exactly to where it has to go, and we'll do that one at a time now. So when i is 0, we're looking at the d. The count array corresponding to d has 6, so it says, just put d in there, and increment that. That means if you have another d, it's going to go into 7. And these things, the way we precomputed them are not going to run into one another. So now a, now that goes in 0, and we increment the count array corresponding to a. Next thing is c, and so that says to put it in 5, and then increment the count array corresponding to c. And f, let's put it in 9. Next is b, we put it in 2, sorry, another f, we put in 10. Next is b, that we put in 2. So you can see, the keys from the input are getting distributed in the output according to the counts and the cumulates that we've pre-computed. So now we get the other d, which goes into 7. We get another b, which goes into 3, and an increment of 4 to move where the next one goes. f goes into 11, the last b goes into 4, the e goes into 8, and the second a goes into 1. Move the items, again, simply by using the key as index into the count array. And then the last step is just to copy the sorted array back into the original input. That's a demo of key-indexed counting. Quick summary of key-indexed counting. We make one pass through the array to count frequencies of each letter, using the key as an index. Then we go through that count array to compute cumulates, just by adding each new one into the running sum. Then we use those cumulates, and access that using key as index, to actually move items over and get them in a sort of order. And then move back into the original array. What's the running time of this algorithm? Well, the analysis is actually quite simple, because it's just a couple of loops through the array to be sorted, and through the count array. And the key fact to note, that it takes time proportional to N + R, and space proportional to N + R. Now R, remember the array that sets a number of different character values, so for ASCII, maybe that's 256, and for genomic data, maybe it's 4. In N, we're assuming we're sorting huge files, so really, this is linear time in many, many practical situations. There's also the question of, is it stable? Yeah, it's actually stable. Because when we do the move, we move things with equal keys in the order that we see them. We keep them in the order that we see them, and that's just the way the method works. So for this special situation, we have a linear-time, stable sorting method, which beats the N lg N bound, and is useful in many practical situations.