0:01

Welcome back. Next we're going to talk about Binary Search Trees, a classic data

Â structures that'll enables us to provide efficient implementation of symbol table

Â and out rhythms. [cough] Let's look at the basic Binary Search Tree data structure

Â with Heaps we talk about implicit representation of trees with an array. For

Â Binary Search Trees we're going to talk about actually explicit tree data

Â structures. A binary search tree is a binary tree in symmetric order. Let's look

Â at the meaning of those words. So, a binary tree is an explicit data structure.

Â It's got what we call nodes which contain information and every node's got two links

Â to Binary Trees that are disjoint from one another, a left tree and right tree

Â associated with each node. Each node has a left tree and a right tree. Links can be

Â null. Left tree can be null and right tree can be null or both. We refer to every

Â node in the tree as the root of a sub-tree and [cough] referred to, the nodes below.

Â Each node is its children so this is the right child of the root. And that's a left

Â link, and so forth. So that's the definition of a binary tree. A binary

Â search tree, each node has a key and every nodes key is larger than all the keys in

Â its left subtree and smaller than all the keys in its right subtree. This is a

Â different ordering than we have to heap if we have a node larger than both its

Â children, this one, every node is between the values, the value of every node is

Â between the values of the nodes in its two subtrees. So the nodes to the left of e

Â are smaller and nodes to the right of e are larger. [cough] now we're going to use

Â these to implement symbol tables and there's values associated with each key

Â when appropriate, we'll write the values in smaller numbers next to the keys. But

Â usually, we're just going to worry about the keys and we'll keep the values, in the

Â nodes along with them [cough] so that's binary search trees. A binary tree in

Â symmetric order that's the data structur e that we're going to use to implement

Â symbol table operations. So how are we're going to represent binary search trees in

Â Java? Well, we're going to extend our implementations of linked list structures

Â to have two references instead of just one. So first of all, is the, there's a

Â node at the root. So, a binary search tree in Java is just going to be referenced to

Â a root node. And every node's got four fields, a key and a value, and references

Â to the left subtree, that contains the smaller keys, and the right subtree that

Â contains the larger keys. So, here's what the, [cough] code is based on. The, inner

Â class that we used to implement nodes has, one, two, three, four instance variables.

Â All of which are private as usual. A key of type key, a value of type value and

Â then references to a left and a right node. For convenience, we'll provide a

Â constructor that takes the key and value as argument and fills in the key and value

Â instance variables then the left and right links are initialized to null. And our

Â data structure then will be a root that points to a node in the tree and then that

Â node will point to subtrees and that will be the data structure that we use for

Â symbol tables. So here's the skeleton of our symbol table implementation. It's for

Â comparable keys associated with values and those are both generic types. The only

Â instance variable is a link to the root node called root. The inner class node is

Â the code that was given on the previous slide, and then we'll need implementations

Â of put and get, and we'll also look at an implementation of delete, and an iterator

Â as well. So that's our skeleton implementation, let's look at the keys. So

Â let's look at search first. So here's a binary search tree let's do a demo of what

Â different searches will look like in this tree. So there's a tree so S is at the

Â root everybody to the left of is less than S over to the right is bigger. So this is

Â a dynamic data structure that kind of follows the same rule as binary search. So

Â to look for a, to do a search for the key H in this tree, we start at the root and

Â we compare our key against the key at the root. And in this case, H is less so all

Â that says to us is that if H is in the tree, it has to be to the left cuz

Â everybody to the right is greater than S. So we move to the left and compare H

Â against the root of the left subtree. In this case that's E. Now H is greater so

Â that says we should go right. Now we can pair H against the root of the right

Â subtree of E, and that's R and it's less so we have to go left cuz everybody to the

Â right of R is bigger and H is smaller. And eventually if the key is in the tree,

Â we're going to hit it. In this case we, we find H as the left sub tree of R in

Â [cough] that's a search hit and then for the get operation, we can return the value

Â that's stored in that node along with the key H. What about an unsuccessful search?

Â Well the same rules follow. If it's in the tree, it's gotta be according to the left

Â or right, according to whether it's smaller or larger than the key at the

Â route. In this case, if we're searching for G, it's gotta go left, because it's

Â less than S. When we come against the E, we gotta go right because it's bigger than

Â E against the R, we have to go left, because it's less than R. We come against

Â the H, we have to go left. And then we come off a null link, and all that says is

Â that there's no place in this tree where G could be so G is not there. So that's a

Â search miss. And the get operation would return null in that case. What about

Â insertion? Well, to insert a new key, all we do is follow the same steps as we did

Â for search. That following off that null link and again, we'll just, for G, travel

Â down the tree until we come to the, null link. Really, what we're saying is when we

Â go to the left link of H, it says, if G is in the tree, it has to be down this link.

Â Since it's not there to in sert G, all we need to do is just put it there and that's

Â how we insert a new node into a binary search tree. Alright, here's the code

Â corresponding to the process that we just demo. And it's quite straight forward

Â simple code as simple as binary search really. We start at the root then we set

Â variable x to be the root and that's going to be the pointer to the current node as

Â we move down the tree. As long as our, our current node x is not null what we'll want

Â to do is a comparison between the key at node x and our search key. If our search

Â key is less, we go to the left. If it's greater we go to the right. And if it's

Â equal we don't even have to test that, that's why it's in grey. If it's not

Â greater or lesser it has to be equal, than we return the value right at that node. If

Â we get to the bottom and our current node is null and that's falling off the bottom

Â of the tree we return null and that's equivalent to saying our buyer convention

Â that, that key is not in our data structure, or not in our symbol table. So

Â that's very straightforward implementation of the get operation for symbol tables

Â with a binary search tree representation. Now, what's the cost? Well, we went down a

Â path in the tree so it's one plus the depth of the node in the tree. [cough] So

Â what about search well search for put there's two cases. If the if we're

Â supposed to associate a value with a key. If the key's already in the tree then

Â we're just going to reset the value. If they key's not in the tree then we add a

Â new node at the bottom. So now it's a little bit tricky the way that we

Â implement it since we're using we use a recursive implementation. And the reason

Â we do this is that it generalizes to give us more efficient data structures later

Â on. So, what we'll do is use a recursive implementation that as it moves down the

Â tree it'll return a link up higher in the tree. And so when we insert a new node L

Â say in this tree we go down that path, we create a new node and then return the link

Â to that node higher up. There's ways to implement that don't involve this, but

Â its, the code is so simple and it extends to more powerful data structures later on

Â that we'll introduce this right now and, and you'll see how it works. So here's

Â the, this is very concise recursive code but its tricky because of that last point

Â so its worth reading carefully. So, we're going to use a recursive method put. That

Â put a associate a value with a key in the tree. And that recursive [cough] method is

Â going to return a node. So the client method put of course, just is supposed to

Â do the association so it has a void return. But what we'll do is invoke a

Â recursive method starting at the root and whatever link gets returned, we'll set

Â that to root. So right away, we can see, let's suppose that we have an empty tree

Â where root is null. So then if we put with null as the first argument, then null is

Â the first argument. What we do is we say if, if the argument is null, return a

Â reference to a new node that associates key with value and then that one has null

Â links. So in this case, that first call will return a link and whatever link gets

Â returned, that will be set to root. So, without any extra special code we insert a

Â node into an empty tree. And that works, again, recursively say we have one node in

Â the tree, and we have a new key to associate. And let's say that key is less

Â than the key at the root. So, so now we do put in it's actually a link to that one

Â node that's got two null links. So it's not null so we'll compare our key against

Â the key in that node. If that comparison comes out left, here's how we do the

Â insert. We change our left link which is right now it's null to whatever put

Â returns. So what's put going to return? Well, that left link is null, so what's

Â going to happen is, in that call x is null. It's going to be cre ated a new node

Â and the link to that node will be returned and that's the link that we'll put in the

Â left. This is a very concise code that otherwise we'd have various cases about

Â saving which link we went down in order to reset that later on. So now the best way

Â having looked at those small examples, the best way to understand this code is

Â recursively. Let's believe that it works for small cases which we have just done.

Â So, lets see what the code does. So if x is null, we want to create a new node and

Â return the link to that node. So, even if it's a huge tree down at the bottom, we

Â just came of a null link. We just want to create a new node with our new key and

Â return a link to that node, that's all we want to do. Now, we can assume that put is

Â going to return a link to a sub-tree that contains our new key and if our new key is

Â smaller than the key at the node that, that we're processing now, then [cough] we

Â want to insert the new key value there and the new node on the left otherwise, we

Â want to insert on the right. Most of the time, the link that we get back will be

Â same as the link that we put in but for the bottom node it will be different. So,

Â if put works properly inserting a new node on the left, then that's what we want our

Â left link to be. If it works properly, putting in the subtree on the right,

Â that's what we want our right link to be. And by the way, if we find a key that's in

Â the tree already, then we just want to reset the value. And in all of these cases

Â where we're on a node that already existed, we just want to return the link

Â to that node. Again, when we look at more sophisticated values we'll be returning

Â something else. So it's worthwhile you know, checking that you believe that this

Â code implements the simple binary search tree algorithm that we demoed where when

Â we fall off a null link we created a new node and replaced that null link with the

Â new node . So that's insert for a binary search tree in a symbol table. And again,

Â the cost of this is the number of compares is equal to one plus the depth of the

Â node. We just go down a path in the tree. Now, what's interesting about binary

Â search trees is that there are many different binary search trees that

Â correspond to the same set of keys. So the number it compares is going to depend on

Â the order in which the keys come in. And that's a key feature of binary search

Â trees that we'll come back to again when we look at more sophisticated data

Â structures. So it depends on how the keys come in. The shape of the, of the tree

Â could be well in the best case so it would be perfectly balanced. And one of the

Â things we'll look at is algorithms that come very, very close to achieving that

Â goal. The typical, typical case it'll be sort of balanced. Now but one problem is

Â if the keys come in and, and really unfortunately, if they come in, in a

Â natural order. Like if they come in, in order, that's the worst case. We don't get

Â any benefit from having it in a tree shape. It's no different than a link list.

Â So we'll, we'll come back to dealing with that worse case in the next lecture. But

Â the point is, the tree shape depends on the order of insertion. Now, but let's

Â look at what's happened or visualize what happens when keys come in, in random

Â order. So the tree grows from the bottom in the little side to side motion it's

Â just accommodating room for each new key as it's added. But you could see that even

Â for this case which is hundreds of keys, the length of the path from top to bottom

Â is not so much. In this case, the maximum distance from the top to the bottom is

Â sixteen the average is only nine and the best you could in a perfectly balanced

Â tree it would be seven. So it's pretty well balanced which means that our search

Â and insert cost in this case for 255 keys is only going to be sixteen quite a bit

Â less. So one remark before we do the analysis is that actually binary search

Â trees correspond exactly to Quicksort partitioning. In the binary search tree,

Â we have a node at the root and we have everybody smaller to the left, and

Â everybody larger to the right. In Quicksort partitioning, after the random

Â shuffling we have the partitioning element and then we process everybody to the left

Â independently of everybody to the right. So it's a [cough] there's a direct

Â correspondence. If there is no duplicate keys Quicksort processes them and referred

Â them out in BSTs and if there's no duplicate keys there's a one-to-one

Â correspondence between what happens with Quicksort and what happens with binary

Â search trees. And we point that out because that helps with the mathematical

Â analysis. In fact, this correspondence with Quicksort partitioning tells us we

Â can take that proof and prove that if you insert in distinct keys into a BST, in

Â random order, then the expected number of compares for a search and an insert is two

Â natural log N. So again about 1.38 log base two of N almost the best that you

Â could do. It also has been shown actually not that long ago, that the expected

Â height of the tree if they're inserted in random order, the height that's the worst

Â case length of a path in the tree. This is the average path in a tree, this is the,

Â the worst of all the keys. This is about four natural log N. So, if you have the

Â keys in random order the binary search tree gives efficient search and insert.

Â Now but there is this problem that the actual worst case height if the keys come

Â in, in order and reverse order and other natural orders that the time could be

Â proportional to n. So, we have this summary which is looking pretty good,

Â because we have the average case for both operations, the search and insert, to be

Â 1.39 log N and that's probabilistic if they are in random order, its extremely

Â likely to be there. But the problem by comparison with sorting is, we don't get

Â to randomize the order the client is providing the keys. So we're going to need

Â something better to provide the guarantee than just randomly ordering the keys.

Â That's what we're going to be looking at next when we look at more efficient

Â algorithms. But first we're going to look at the implementation of order and

Â operations with the binary search tree structure. It expands like binary search

Â to handle all these convenient client operations in a very natural manner.

Â That's the implementation of binary search trees for symbol tables. They give

Â sufficient implementation of both search and insert.

Â