0:16

Well, one thing we could do is maintain a linked list.

Â We could keep it in order or keep it unordered.

Â This version keeps it unordered.

Â So we're going to have nodes in the linked list that have key value pairs.

Â They have every key in the symbol table and a value associated with that key.

Â For search, we have to, since it's unordered,

Â scan through the whole list to find a match, a key that's there.

Â For insert, I would also have to scan through all keys to find

Â the place to update a value, if it's a value that's already there.

Â And if there's no match, we could add it to the front.

Â So here's our simple client for traces,

Â so if we associate S with zero, we just had it.

Â That's our one node linked list that's got that information.

Â Associate E with 1, that's not there, so we just add it to the beginning

Â of list A with 2 R, with 3C, with 4 H, with 5 and so forth.

Â So now when we associate E with 6,

Â we have to search through the list to see if there's an E.

Â 1:35

It's possible to implement symbol tables that allow multiple values with

Â the same key and so forth.

Â And that leads to different types of clients,

Â different types of implementations.

Â We're going to stick to this associative array abstraction and

Â no duplicate keys in the symbol table,

Â because it both simplifies implementations and leads to simpler client code.

Â 1:58

Okay, X7 is a new value.

Â A8, we found A in there and update the value 8.

Â And then, M9 and P10, L11 are all not there and they go at the beginning.

Â And then the last one changes the value at E, again, 12.

Â So implementing this is a simple example

Â of linked list processing, a slight modification of our stack and queue code.

Â 2:49

And if everything's random,

Â then on average you only have to look halfway through for a successful search.

Â Well, you still have to insert.

Â Another issue is, for many clients, if the keys are ordered,

Â it's nice to be able to iterate through the symbol table and order.

Â And this one, by definition, doesn't provide that.

Â And this one just uses equals, so the keys don't have to be comparable for this.

Â It just uses equals.

Â So our challenge is to look for methods that

Â give us more efficient implementations for these search and insert operations.

Â And we've already looked at an algorithm that can do this, and

Â that's binary search.

Â 3:58

And then find the index associated with the key that we're searching for

Â using binary search.

Â And then use that index to get the value that's associated with that key,

Â that's stored in a parallel array.

Â And we looked at the binary search algorithm earlier in the course.

Â And so, for example, if these are the keys in our symbol table and

Â we're doing a search for the index where P is stored, we look at the middle.

Â P is bigger than L, so we look to the right.

Â Look in the middle of the right half.

Â P is less than R, so we look to the left.

Â Continue until we find P.

Â When we find P, we return its index and

Â we use that index to get us the value that we need.

Â Or another way to look at this is it implements the function,

Â how many keys are there that are less than K.

Â So, for example, for Q, that's an unsuccessful search.

Â You can figure out from the last index

Â when you don't find your element that you're seeking.

Â You can figure out the return value which is the number of

Â keys that are less than it.

Â So that's a trace of implementing binary search to find the rank of a key in

Â ordered array.

Â And again, for successful, you can use that rank to return the value.

Â And if it's unsuccessful,

Â you can use that rank to figure out where to insert the new key.

Â [COUGH] All right, so this is the code for

Â the get operation and this rank which is binary search.

Â So this is precisely the binary search code that we looked at before.

Â So let's look at the gap.

Â So if the whole table is empty, return null.

Â Otherwise, we call rank and

Â that gives us a number of keys less than the current key.

Â And so, that is where we look to check to see if that key is there.

Â If it's there, then we return the value with the same index and parallel array.

Â 6:20

Now, the problem with binary search is, well, not necessarily the problem,

Â but the situation is that when it's time to insert a new element,

Â we have to move everything larger over one position, just like an insertion sort.

Â So if the table has A, E, R, and S, and we have to insert the value C,

Â then we have to move the E, R, and S over one position to put the C.

Â And then put the value associated with C.

Â You'll have to do the same thing in the vals array.

Â Move all the values associated with those keys over in one position and

Â put the associated value in.

Â So this is a trace of what would happen for our trace.

Â And again, every insertion involves

Â making a new position by moving all the larger keys over one position.

Â Do the same thing in the vals array.

Â [COUGH] And if it's changing the value

Â associated with a key that's already there, then it's just a matter of

Â finding where the key is and changing the value at that index.

Â So from that trace,

Â it's pretty easy to see what's involved for the code.

Â And we'll skip that code and

Â just take a look at the comparison between this elementary implementation for

Â symbol tables with the sequential search in an unordered list.

Â So, one thing is we're using a different key interface.

Â We're taking advantage of the fact that the keys are comparable

Â to give us an efficient search.

Â We can do search in worst case, in average case, in time proportional to log N.

Â That's what binary search provides for us.

Â And this is a fine data structure for symbol tables where there is,

Â [COUGH] that are relatively static,

Â where the values don't change much, and most of the operations are search.

Â It's hard to beat binary search.

Â On the other hand, in a dynamic situation where there are a lot of inserts,

Â this method is going to be problematic, because the cost of its insert is linear.

Â It's proportional to N over 2.

Â If you have a huge number of operations and

Â every one of them is proportional to the symbol table size,

Â then you're just not going to be able to support huge numbers of keys.

Â What we want is sufficient implementations of both search and insert.

Â