0:17

Let's take a quick review of where we left off with sorting.

Â So we considered a number of sorting algorithms,

Â starting with insertion sort, and then mergesort, quicksort and heapsort.

Â And we got to the point where we could find an algorithm,

Â that's heapsort, that guarantees to sort n items in time

Â proportional to N lg N, without even using any extra space.

Â Unfortunately, not stable.

Â And all these algorithms are useful,

Â for any type of generic key,

Â as long as it implements the compareTo operation.

Â And not only that, we proved that any algorithm that just uses compares,

Â has to use number compares proportional to N lg N.

Â So in a very important sense, mergesort or

Â heapsort, for example, are optimal.

Â You can't use asymptotically fewer compares for either one,

Â and with heapsort you can't use less extra space.

Â So why do we consider other sorting algorithms?

Â 1:39

There's a lower bound, why are we thinking about this?

Â And the question is, can we do better?

Â And obviously, we're here because the answer is that

Â we can do better, if we don't depend on compares.

Â The one assumption made by the lower bound is that we use compares, but

Â we don't always need to use compares.

Â And so, let's look at an example.

Â Key-indexed counting is a fine example of that, and

Â it's representative of a fairly common situation in a sorting application.

Â Where it happens to be, that the keys that we're using to sort are small integers.

Â So in this case, this is supposed to mimic an application where there's students,

Â and they're assigned to sections,

Â there's not too many sections, and we want to get the thing sorted.

Â 3:26

Now don't forget that we're sorting according

Â to a sort key, but usually we're sorting bigger

Â generic items that have other information associated with them.

Â If you were just sorting the small integers, you could just count how many

Â 1's there are, how many 2's there are, and like that.

Â And then in one path, and then if there's three 1's, then just output three 1's, and

Â so forth.

Â But the complication is that we have to carry the associated information along,

Â so we have to work a bit harder than that.

Â So here's the code for this method called KeyIndexedCounting, and

Â let's look at a demo.

Â 4:12

So here's the KeyIndexedCounting demo.

Â Now to make this is a little less confusing, and not so many numbers,

Â we're going to use lowercase a for 0, b for 1, c for 2, and like that.

Â So it's the a minus first letter of the alphabet,

Â however you want to think of it, and we're only going to look at 6.

Â So we're supposing that we're sorting this array that has six different small

Â integers.

Â And we're using lowercase letters to represent the integers, so

Â that we can easily distinguish between the keys and the indices.

Â So now let's look at how the processing for this.

Â So the first thing that we do, is we go through and

Â we count the frequency of occurrence of each letter.

Â So the way that we do that, is we keep an array.

Â Now, the array's actually gotta be one bigger than the number of different

Â keys that we have, the number of different small integers that we have.

Â So in this case, array of size seven.

Â And just to make the code a little cleaner, we keep the number of a's in

Â count of 1, The number of b's in count of 2, and so forth.

Â And so once we set up, that's what we want to do,

Â then it's trivial to go ahead and count the frequencies.

Â We'd simply go through, for i from 0 to n, we'd go through our input.

Â And when we access a value in our input, it's a small integer,

Â so it's 0, 1, 2, 3, 4, or 5.

Â And we simply add 1 to that integer, and use it to index into the array.

Â So when we see an a that's 0, then we're incrementing count of 1,

Â when we see a b that's 1, we're incrementing count of 2, and so forth.

Â So in this case we increment count corresponding to d,

Â and then a, and c like that.

Â So every time we encounter a new key,

Â we just simply increment one of these counters.

Â In one pass-through, we get an array that gives us the number of a's,

Â b's, c's, d's, e's, and f's.

Â That's the first pas of key-indexed counting,

Â count the frequencies of each letter, using the key as an index.

Â 6:33

Now the next step is called computing cumulates,

Â and that's a really easy thing as well.

Â All we do is we go through the count array, and simply,

Â at each step, we add the current one to the sum computed so far.

Â So if we look before, we had two a's and three b's.

Â So that means that there's five letters less than c, that's the a's and the b's.

Â And there's six letters less than b, and eight letters less than e, and so forth.

Â And that's just obtained by, we start with 2, add a 3 to it,

Â get 5, add 1 to it to get 6.

Â And with that one pass through the count array, then we can find out,

Â for example, there's six keys less than b, and eight keys less than e.

Â And those cumulates tell us where the d's go in the output.

Â There's 6 keys less than d, and 8 keys less than e,

Â so the d's have to go in a[6] and a[7].

Â So this is an array of indices that is going to tell us how to

Â distribute the keys in the output.

Â So that's the next step, access the cumulates,

Â using the key as index to move items.

Â So let's take a look at,

Â so now remember when we see an a, we're just going to count that as 0.

Â So we're going to go to count[0], and that will access this entry in the count array.

Â So we go through the whole array to be sorted, and we move each key

Â exactly to where it has to go, and we'll do that one at a time now.

Â So when i is 0, we're looking at the d.

Â The count array corresponding to d has 6, so it says,

Â just put d in there, and increment that.

Â That means if you have another d, it's going to go into 7.

Â And these things,

Â the way we precomputed them are not going to run into one another.

Â So now a, now that goes in 0, and

Â we increment the count array corresponding to a.

Â Next thing is c, and so that says to put it in 5,

Â and then increment the count array corresponding to c.

Â And f, let's put it in 9.

Â Next is b, we put it in 2, sorry, another f, we put in 10.

Â Next is b, that we put in 2.

Â So you can see, the keys from the input are getting distributed in the output

Â according to the counts and the cumulates that we've pre-computed.

Â So now we get the other d, which goes into 7.

Â We get another b, which goes into 3, and

Â an increment of 4 to move where the next one goes.

Â f goes into 11, the last b goes into 4,

Â the e goes into 8, and the second a goes into 1.

Â 10:50

And the key fact to note, that it takes time proportional to N + R,

Â and space proportional to N + R.

Â Now R, remember the array that sets a number of different character values, so

Â for ASCII, maybe that's

Â 256, and for genomic data, maybe it's 4.

Â In N, we're assuming we're sorting huge files, so really,

Â this is linear time in many, many practical situations.

Â 11:27

There's also the question of, is it stable?

Â Yeah, it's actually stable.

Â Because when we do the move,

Â we move things with equal keys in the order that we see them.

Â We keep them in the order that we see them, and

Â that's just the way the method works.

Â So for this special situation, we have a linear-time,

Â stable sorting method, which beats the N lg N bound,

Â and is useful in many practical situations.

Â