0:03

So here's a better method.

It's a very classical method called binary search,

and it's quite intuitive.

So, the idea with binary search is that,

first of all, we put the whitelist array in sorted order.

Now we're going to have to talk about how to do that,

and that's the second part of this lecture.

If it's in sorted order,

then what we want to do is examine the middle key.

If that one's the one we're looking for,

then just return that index we found it.

If it's larger then we know,

that our key must be in the lower half of the array,

the ones with lower indices.

And if that one in the middle is smaller than our key,

then we have to look in the half with the larger indices.

So for example here's our list again,

and we're looking for oscar.

We check in the middle.

Is that key there eve?

Is that the key we are looking for?

And the answer is no.

Our key is bigger,

so we can eliminate the first half of the array and just look in the second half.

Then we look in the middle of that one.

In this case, It's peggy. Our key is smaller.

Now we can eliminate

the top half of the array and concentrate on the lower half of that part of the array.

And again, oscar it's not mallory,

so we know it has to be bigger, and eventually,

we get down to where we find our key,

or we get to an array of a size,

one that's not equal,

and we know that our key is not there.

One of the two must be true.

So in this case, we found a match,

Return 10. That's binary search.

So, it takes a little bit of arithmetic.

We'll use a math-like notation.

If we say a of square bracket, low, high,

where low and high are indices in the array,

and then regular parenthesis.

That means that we are considering a of lo,

a of lo+1, all the way up to hi-1,

but not hi itself.

It's an open interval in math,

and the same thing works for an array in Java.

And so if we're looking in this part of our array where we include lo and not hi,

hi is bigger than lo, or equal to it.

Then if we want to look in the middle,

the way we get the middle is subtract our two indices,

divide by two and then add that to lo.

So the number of elements is hi minus lo divide that by two,

that's half the size of the array,

and you add that to lo that's our middle.

And then once we've compared our key against mid,

we don't have to consider that one anymore.

So we just include it with

the open parenthesis to exclude that in the lower half of the array.

And then the upper half the array, we do mid+1,

and then again we use hi,

but exclude a of hi.

And this is just a detail that helps make the arithmetic easier to

check when we implement the recursive routine that's going to do this.

You need to study it a little bit.

It's easy to get this wrong.

But it's worth to take a quick look at this.

Okay, with that in mind, let's look at the Java code for implementing the search.

So, that's the method that we're supposed to implement,

taking our key and the array as arguments.

And we're going to call a recursive routine that also

includes the indices within the array that we're concerned with,

lo and hi, as on the previous slide.

So that's all set up.

The array goes from zero to a.length-1.

So with our open interval notation,

if we call the recursive routine from zero to a.length,

we're doing exactly what we want,

search the entire array.

And then recursively, we check that our hi is bigger than our lo.

If it's equal, then there's nothing in the array,

or even if it's less, there's nothing in the array.

So that's our termination condition.

That's an unsuccessful search.

We never found our key.

Then we compute the index of the middle element,

just as on the previous slide.

And we compare that element against our key,

and just result is -1,

if our key is less,

0 if it's equal,

and 1 if it's greater.

And then we just test that to see whether we're in the left part of the array,

in the right part of the array with bigger indices,

or if it's not greater or less than, it's equal to 0,

and then we return successfully with the index that contains the key.

So that's the recursive method that solves this problem.

And again, it's easy to make mistakes in this.

It's worth the study,

it's worth some study.

But it's basically a pretty simple program once you

make sure that you have some discipline and making sure that

you test for the case when it's empty,

or it's got just one key as we have.

Okay, so here's what it looks like,

say, with the recursion trace.

So in our example,

searching for oscar, we're searching from 0 to 15.

So to do that, then we compute 7.

Oscar is bigger than eve,

so then we go to the higher indices from 8 to 15.

And now the middle is 11,

oscar is less than peggy,

so now we go in the part with the lower indices, that's a through 10.

Now the middle is 9, mallory,

oscar is bigger than mallory,

so we go the part with higher indices, 10 and 11.

10 and 11 is just an array of size one,

and that's where oscar is.

So we return that index.

And then we just go back up the recursive tree,

returning that value all the way up to the main return 10,

and that's a successful search.

All right. What about the,

let's do a mathematical model to study this method?

For simplicity, we'll do

an exact analysis when the number of keys is a power of two minus one.

And you'll see in a bit why it works that way.

So if N equals two to the n, minus one,

then n is about log base two of N. It's actually equal to log raised two of N plus one.

But let's look at what happens in this case.

So, if you subtract one and then divide by two,

you are always going to get,

so for the first call,

it's always two to n minus one.

But if you subtract one and divide by two for the second call,

it's always going to be two to the n,

minus one, minus one.

And again, subtract one,

divide by two the size for the next call is two to the n minus two, minus one.

You keep going that way,

eventually we get down to always for the nth call,

the subarray size is going to be exactly one.

So every search miss is going to be a top to bottom path in this tree.

And the total number of compares,

so for every call,

we do one compare against our key against the middle key,

and its current array,

the total number compares is going to be n,

which is about equal to the log of N. That's

an indication of how binary search gains its efficiency.

So with that almost,

that is a proof that binary search uses about log and compares for a search miss.

And with a little more work,

you can show that same is true for a search hit,

and it doesn't matter if it's a power of two,

you still get the same results.

So these are all kind of interesting exercises and discrete math,

but you can see that the running time is going to be

proportional to log N for whatever end is.

And you can learn more about this,

maybe in a course on algorithms.

But understanding it in terms of when n

is a power of two minus one is fine for our purposes.

So our model is that the cost of a search is going to be about

log base two of N. That's the point of this mathematical analysis.

And again, we want to test this empirically.

And Alice has already signed up for the course on algorithms.

So let's do our empirical test for the scenario that we considered for sequential search.

And again, we'll do the same story,

and now we'll test binary search, not sequential search.

So in this case,

for 100,000 it takes just one second.

For 200,000 three seconds.

For 400,000 six seconds.

And now our ratio is a lot closer to two than four,

and we can go up to 1.6 million,

and it's only taken 30 seconds.

Remember, for sequential search it was taking hours.

And so for 10 million you can do it in a few minutes,

and the ratio is about two and we're at about 50,000 transactions per second in holding.

So that validates the hypothesis that each of

the N searches takes time proportional to N log N,

that's something that will scale.

And binary search is certainly

much more effective method for solving this problem than sequential search.

But the question is, how do we get the list in the sorted order at the beginning?

That's the thing that we're going to have to consider next.