0:01

Now we'll look at the problem that's related to sorting called selection that's

Â also well solved by Quicksort partitioning. This is a simpler problem.

Â We're given an array of n items that are ordered and our task is to find the k-th

Â largest. There's lots of important applications for this. So like if we

Â wanted to find the minimum item that's k = zero or the maximum item that's k = n -

Â one or the medium that's k = n/2. And there's many kinds of applications from

Â people processing data. I wanted to find the top k or the medium or other order

Â statistics so that's what selection is all about. Now, here's an example where we

Â want to use theory as a guide. What kind of efficiency might we expect in a

Â selection algorithm. Well, first of all, it's easy to see that we can solve

Â selection and in law at end time. How would we do that? Well, we just sort the

Â array and then if we want to find the smallest, we'll look at the first position

Â or the largest, we'll look in the last position or the medium, we'll look in the

Â middle. In fact, if k is small, the running time is going to be proportional

Â to n. Because if you're looking for the smallest, you can just go through the

Â array and find the small or the smallest in one pass through or if you're two,

Â you'll find it and two passes through. So, you can imagine trying to look for a

Â selection algorithm that takes time proportional to n and also the lower bound

Â is n because you have to look at everything. If you don't look at

Â everything, you might miss the one item that you're looking for. So, from these

Â observations it's clear that what we, what we'd like is a selection algorithm that

Â takes linear time. But at this point, the question is, is there a linear time

Â algorithm that works for every k? Or possibly selection is as hard as sorting.

Â This kind of question plagued a lot of people in this late 60's or early 70's as

Â these types of problems emerge for computing applications. So, it's an

Â interesting question to think about for sure. Well in his original paper in 1961

Â Hoare gave a solution to the selection problem based on partitioning. And the

Â idea is just a version of Quicksort in a way. We're going to do our partitioning so

Â that we get entry a(j) in place of the array. Nobody to the left is larger,

Â nobody to the right is bigger. But then, when we're doing selection, what we'll do

Â is just go in one sub array or the other depending on where j is. If j = k, we're

Â done, we've found the k is the largest. If k is to the left of j, then, we just do

Â the left sub-file which is set high to j - one. And if k is to the right of j, we

Â just do the right subfiles that load the j + one and that's all this code does is

Â that it, we could do a recursive, a recursive call but this just does it by

Â resetting the values of the parameters. Do one partition then check whether you to

Â your k-th element is going to be on the left part or the right part and reset

Â lower high accordingly. If it's equal, then you found it and you return it and

Â you keep going until you get to a point where you have only one element. That's

Â the a Quicksort like implementation solving the selection problem. Notice

Â again that it depends on the random shuffle at the beginning that's going to

Â be important for performance. Alright. So there needs to be a mathematical analysis

Â to, to characterize the running time of this program in the fact is that quick

Â select this method takes linear time on the average. We won't give the full proof.

Â It's actually quite a bit more complicated than the one just on for Quick sort. But

Â intuitively, we can see kind of what happens each partitionings that maybe

Â splits the array approximately in half. So that, that means you'd have, if you did

Â exactly and [inaudible] + n/2 + n/4 and so forth which adds up to about two N compare

Â so linear cross. If you do the, actually it doesn't cut it in half at exactly each

Â time only on average so you need a fuller analysis like the one we did for Quicksort

Â and the bottom line of that analysis gives the number of comparisons required as a

Â function of n and of k in terms of this formula here and if you plug in k = n/2,

Â you get the result that the number of compares required to fine the median

Â that's the highest value this formula can take is two + two natural log of two. So,

Â linear time to find the k-th largest for any value of k. Now again it's going to

Â use, this is a method that's linear time on the average. It's actually going to be

Â quadratic in the worst case but again, the chance of that it will happen with a

Â random shuffle is less than the chance that we'll be struck by lightning. Its a

Â probabilistic guaranteed fast algorithm. Now, from a theoretical standpoint that's

Â a little unsatisfied and in, in 1973, there's a famous paper that found a

Â compared base selection algorithm that guarantees to solve the problem in linear

Â time. This is areal landmark in the theory of algorithms because for a long time,

Â it's not known, we knew we could have the average case, the linear time but could we

Â find a worst case? And this paper found such a construction. Now in practice, this

Â construction is, is rather high. So, the method is not really used in practice. And

Â so, there is still the goal of a, of a fast guaranteed linear time selection

Â algorithm maybe somebody in this class will invent someday. This is another

Â example where we use theory as a guide. It's still worth while to look for a

Â practical linear time worst case algorithm. Well then, maybe somebody in

Â this class will invent that but until something like that is discovered use the

Â quick select based on Quicksort partitioning you can get linear time

Â selection when you don't need a full sort. That selection of simple problem like

Â sorting that is well sound with Quicksort partitioning.

Â