0:02

Another popular closure resolution method is known as linear probing. In this you

Â know many different versions of hashing that are based on this idea. With linear

Â probing is called open addressing and is also around the same time in the 50's the

Â idea is just use an array. Instead of using space for the length in a length

Â list. I use that same space, and just, allocate an array. In this case, the size

Â of the array is going to have to be bigger than the number of keys that we

Â [inaudible] expect. And we use the empty slots in the array. To. Essentially

Â terminate the length of the [inaudible] list that we have to search through when

Â we're doing a insertion. So let's look at a demo of how it looks. So to hash again

Â we do the same thing, we just map the key to a index. But, in linear probing, to

Â insert what we do is when we put it in position I if that's free, if not we just

Â look at I plus one, and I plus two, and wrap around to the beginning if we reach

Â the end. Now that's also simple to implement and it works well as long the

Â size of the array is, significantly bigger than the number of keys. Let's look at,

Â Well it's a demo. So we start with an empty table, insert s, it's hash value is

Â six, six is empty so we put it there. Now we look at e, hash of e is ten, we look at

Â ten, it's empty so we put e there. So at the beginning we're going to be fine. A is

Â four, empty, put it there. R is fourteen, empty, put it there. So we just

Â essentially, using the hash funtion as an array index. C is five, that's empty and

Â we put it there. So H now, the hash value of H is four. So now we look at four, and

Â that's occupied, so we can't put the H there. And then linear probing says, just

Â look at the next position, look at five. That's still not empty. So we look at six.

Â And we keep going till we find an empty place, and then we put H there. Now when

Â we search, we're going to have to do the same thing. We'r e going to have to look

Â at all those positions to look at H. The. Group of four key, continuous keys in a

Â table space there is called a cluster and clearly we want to keep those clusters

Â small. And we do that by juts by not putting too many keys in to the table. So

Â X hashes to fifteen, that's empty so we put it there, M hashes to one, that's

Â empty and we put it there. P hashes to fourteen, 14's occupied, 15's also

Â occupied, now we run off the end of the table, and look at zero, and that's empty

Â so we put it there. L hashes to six. Six is occupied. We look at seven, seven is

Â occupied. We look at eight, and we put it there. And, so that's an example of

Â inserting, keys into a hash table. And now, for a search, we just do the same

Â thing. We, use the hash function. To search for E, E's hash value is ten so we

Â look in ten and there it is. So that's a search hit. If we're going to search for

Â say L L's hatch value is six so it's not there. So in order to look at every place

Â in the table where L could be, we have to keep looking til we found an empty table

Â position, or we find L itself. So now we look at seven L not there, we look at

Â eight L is there, that's a search hit. If we have a value that's not in the table

Â like K, well hash and is in position five, no, six no, seven no, eight no and we find

Â an empty position at that point we can conclude that K is not in the table.

Â Because if K were in the table it would be somewhere between it's hash point five and

Â that empty position nine. That's a search miss, and we return all. So that's a short

Â demo of linear probing hashing. So here's a summary of linear probing, hashing. To.

Â To get started we map a key to a integer between zero and m-1 where m is the sides

Â of our array where we are storing the keys. To insert we put the key value pair.

Â Use parallel arrays [inaudible] and the value array with the same index. We put

Â the entry at the table index A for three. If not try I+1 I+2 until getting to a

Â empty position. And for search you do the same thing you hash to the table position

Â and you look there into the right. To find the key and you stop when you find an

Â empty table position. Find the key or find an empty table position. Now, it's

Â essential that the array size is greater than the number of key value pairs N. And

Â for linear probing hashing, really, the implementation needs to include array

Â resizing, whenever the hash table gets too full. Here's the implementation. And it's,

Â quite straightforward, given the demo that we talked about. You use the same hash

Â function. And we use parallel arrays for the value in the keys. And we have to use

Â ugly cast, 'cause we can't have a race of generics. Then let's do the search. So. We

Â just have a for loop starting at hash of key and going until we get to a position

Â that's null. As long as it's not null, we stay in the loop and increment I mod m. So

Â that's when I gets to the end it gets to the end, it's in the position m minus one

Â and it goes... In the next increment goes back to zero at the left end of the table

Â and we just test for all the non null keys. If it's equal, if it is, go ahead

Â and return the associated value and if you get to an empty position, then return

Â null. And the implementation of put is similar. Find a, a position, if it's,

Â that's equal, And then, reset the key, in the value. If the key's there, it just

Â resets the value. If they key's not there, it puts a new entry in. So again, that's,

Â fairly little code, to implement, a fast symbol table and insert, search and

Â insert. But it's only going to be fast, if the, table size is set appropriately. In

Â ancient times, memory was, at quite a premium and so people were very concerned

Â in m-m-making sure that the hash table never, got too empty. Remember in the

Â first computers, each bit was a physical thing, a magnetic core that somebody had

Â to string a wire through, so. The bits were really expensive, and people wanted

Â to make sure, that they were making best use of the memory. And just leaving empty

Â positions around, in a hash table, or using links in a link list, did not seem

Â like an appropriate use of space. And, so there was quite a bit of effort, devoted

Â to figuring it out, how full we could get the hash table, in linear probing. And how

Â close it could get to full without sacrificing performance. And one way to

Â think about what goes on is to just watch what happens when a hash table fills up.

Â So here we just as, as it goes up we're showing each key getting inserted in the

Â number of probes of the table that are needed for the insertions are J hash to

Â the same position that A; you had to look for a while, and the one thing to notice

Â is as the table gets full, is that first of all. You have, these clusters or these

Â chains building. And so, what's clear about that is that, it means that, the new

Â hash is likely to hash into a big cluster. >> And not only that once you have a big

Â cluster and you hash into the middle of it you've got a good chance that, that

Â clusters going to get longer, or worse. That's it's even going to merge with

Â another big cluster. And so, that's the situation as the table fills up. You get

Â long clusters and they're likely to get longer. And the math bares that out. Now

Â this was studied in detail by Knauf, Don Knauf, in the 1960's and actually this

Â problem, Knauf says, was the origin of the origin of analysis of algorithms.

Â Mathematicians were trying hard to understand this problem and were ready to

Â give up and he realized you could use classical balls and bins type

Â probabilistic analysis. Not an easy analysis, but we actually could make

Â precise accurate statements about the performance of this algorithm. And those

Â statements can be borne out in practice, because the hash functions approximate

Â random, the math assumes random and the formulas predict what actually happened in

Â practice. No way can you formulate the problem as so called parking problem. So,

Â what happens is that you are on a one way street and you are looking for a parking

Â place and, it's, the idea's you start looking for a parking place at particular

Â times and say "Okay, now I need a parking place", and what you're doing is linear

Â probing hashing. If the current space is taken, you try the next space and the one

Â after and so forth. And the question is. If every car. Starts looking for a place

Â at a random time. That. Then that models the hash function, then how far do they

Â have to go to look for a place? That's canoot's parking problem. And he was able

Â to show, and we'll talk just a little bit about this, that if, there's, only half of

Â the parking spaces are occupied, then, on average, half the people find, find it

Â after one place and the other half have to look one extra. So that's the kind of

Â performance that we want. But as it gets full. The displacement gets up to square

Â root, of pi M over eight. Which is obviously much higher than we want. We

Â don't want our searches to take that long. And that actually, the analysis, is

Â amazing function that goes back to famous Roman Nuygen and other classical results

Â from our commentorial analysis. What Canute's theorem says is that under the

Â uniform hashing assumption, the number of probes in the linear hash table size M,

Â that is alpha percent full, so the number of keys is a fraction of M, is for a

Â search miss half one plus one over alpha, and a search miss one plus one over one

Â minus alpha squared. One myse alpha is for the hit, one myse alpha for the squared

Â for the insert. Now as alpha gets close to one, you can see these things are going to

Â grow, and particularly the search miss is growing to grow quite, quite a bit. If

Â it's 9/10's full one over one minus alpha squared is 100 one over 100, so it means

Â it's going to be 50 p robes for a search miss if it's 9/10's full, and that's

Â independent of N and M, whereas if it's half full then we get the nice. Numbers of

Â only 3-house for a hit, and only 5-house for a miss. And, again these formulas are

Â nice approximate formulas, but Knuth, once he figured this out, in 1963, tells

Â stories, that time, he decided to write his famous series of books on algorithms.

Â Now there's four volumes out and more planned, and this is where, all computer

Â scientists go. For detailed information on a performance, eval grievance. So, in, in

Â summary. You can't have M too large, what we want to use nowadays is array resizing

Â to make sure that the array is always about half time, half full. And if we can

Â keep the array about half full then we get constant time performance for search hit

Â and search miss. And linear probing is very attractive in this case. There's

Â other things that we can do algorithmically to bring down the search

Â time a little bit. Like using another hatch function rather than looking at the

Â next entry. Use another hatch function to determine the stride that we're going to

Â use. And that brings it down somewhat and allows us to keep the tables more full.

Â But the bottom line is that now we have two methods that under the uniform hashing

Â assumption can give us constant time, search, search it insert and delete. For

Â symbol table implementations where we don't need ordering. And we've got a

Â reasonable hash function. So, that's a summary of linear probing or second hash,

Â collision avoidance strategy.

Â