Welcome back. Next we're going to talk about Binary Search Trees, a classic data structures that'll enables us to provide efficient implementation of symbol table and out rhythms. [cough] Let's look at the basic Binary Search Tree data structure with Heaps we talk about implicit representation of trees with an array. For Binary Search Trees we're going to talk about actually explicit tree data structures. A binary search tree is a binary tree in symmetric order. Let's look at the meaning of those words. So, a binary tree is an explicit data structure. It's got what we call nodes which contain information and every node's got two links to Binary Trees that are disjoint from one another, a left tree and right tree associated with each node. Each node has a left tree and a right tree. Links can be null. Left tree can be null and right tree can be null or both. We refer to every node in the tree as the root of a sub-tree and [cough] referred to, the nodes below. Each node is its children so this is the right child of the root. And that's a left link, and so forth. So that's the definition of a binary tree. A binary search tree, each node has a key and every nodes key is larger than all the keys in its left subtree and smaller than all the keys in its right subtree. This is a different ordering than we have to heap if we have a node larger than both its children, this one, every node is between the values, the value of every node is between the values of the nodes in its two subtrees. So the nodes to the left of e are smaller and nodes to the right of e are larger. [cough] now we're going to use these to implement symbol tables and there's values associated with each key when appropriate, we'll write the values in smaller numbers next to the keys. But usually, we're just going to worry about the keys and we'll keep the values, in the nodes along with them [cough] so that's binary search trees. A binary tree in symmetric order that's the data structur e that we're going to use to implement symbol table operations. So how are we're going to represent binary search trees in Java? Well, we're going to extend our implementations of linked list structures to have two references instead of just one. So first of all, is the, there's a node at the root. So, a binary search tree in Java is just going to be referenced to a root node. And every node's got four fields, a key and a value, and references to the left subtree, that contains the smaller keys, and the right subtree that contains the larger keys. So, here's what the, [cough] code is based on. The, inner class that we used to implement nodes has, one, two, three, four instance variables. All of which are private as usual. A key of type key, a value of type value and then references to a left and a right node. For convenience, we'll provide a constructor that takes the key and value as argument and fills in the key and value instance variables then the left and right links are initialized to null. And our data structure then will be a root that points to a node in the tree and then that node will point to subtrees and that will be the data structure that we use for symbol tables. So here's the skeleton of our symbol table implementation. It's for comparable keys associated with values and those are both generic types. The only instance variable is a link to the root node called root. The inner class node is the code that was given on the previous slide, and then we'll need implementations of put and get, and we'll also look at an implementation of delete, and an iterator as well. So that's our skeleton implementation, let's look at the keys. So let's look at search first. So here's a binary search tree let's do a demo of what different searches will look like in this tree. So there's a tree so S is at the root everybody to the left of is less than S over to the right is bigger. So this is a dynamic data structure that kind of follows the same rule as binary search. So to look for a, to do a search for the key H in this tree, we start at the root and we compare our key against the key at the root. And in this case, H is less so all that says to us is that if H is in the tree, it has to be to the left cuz everybody to the right is greater than S. So we move to the left and compare H against the root of the left subtree. In this case that's E. Now H is greater so that says we should go right. Now we can pair H against the root of the right subtree of E, and that's R and it's less so we have to go left cuz everybody to the right of R is bigger and H is smaller. And eventually if the key is in the tree, we're going to hit it. In this case we, we find H as the left sub tree of R in [cough] that's a search hit and then for the get operation, we can return the value that's stored in that node along with the key H. What about an unsuccessful search? Well the same rules follow. If it's in the tree, it's gotta be according to the left or right, according to whether it's smaller or larger than the key at the route. In this case, if we're searching for G, it's gotta go left, because it's less than S. When we come against the E, we gotta go right because it's bigger than E against the R, we have to go left, because it's less than R. We come against the H, we have to go left. And then we come off a null link, and all that says is that there's no place in this tree where G could be so G is not there. So that's a search miss. And the get operation would return null in that case. What about insertion? Well, to insert a new key, all we do is follow the same steps as we did for search. That following off that null link and again, we'll just, for G, travel down the tree until we come to the, null link. Really, what we're saying is when we go to the left link of H, it says, if G is in the tree, it has to be down this link. Since it's not there to in sert G, all we need to do is just put it there and that's how we insert a new node into a binary search tree. Alright, here's the code corresponding to the process that we just demo. And it's quite straight forward simple code as simple as binary search really. We start at the root then we set variable x to be the root and that's going to be the pointer to the current node as we move down the tree. As long as our, our current node x is not null what we'll want to do is a comparison between the key at node x and our search key. If our search key is less, we go to the left. If it's greater we go to the right. And if it's equal we don't even have to test that, that's why it's in grey. If it's not greater or lesser it has to be equal, than we return the value right at that node. If we get to the bottom and our current node is null and that's falling off the bottom of the tree we return null and that's equivalent to saying our buyer convention that, that key is not in our data structure, or not in our symbol table. So that's very straightforward implementation of the get operation for symbol tables with a binary search tree representation. Now, what's the cost? Well, we went down a path in the tree so it's one plus the depth of the node in the tree. [cough] So what about search well search for put there's two cases. If the if we're supposed to associate a value with a key. If the key's already in the tree then we're just going to reset the value. If they key's not in the tree then we add a new node at the bottom. So now it's a little bit tricky the way that we implement it since we're using we use a recursive implementation. And the reason we do this is that it generalizes to give us more efficient data structures later on. So, what we'll do is use a recursive implementation that as it moves down the tree it'll return a link up higher in the tree. And so when we insert a new node L say in this tree we go down that path, we create a new node and then return the link to that node higher up. There's ways to implement that don't involve this, but its, the code is so simple and it extends to more powerful data structures later on that we'll introduce this right now and, and you'll see how it works. So here's the, this is very concise recursive code but its tricky because of that last point so its worth reading carefully. So, we're going to use a recursive method put. That put a associate a value with a key in the tree. And that recursive [cough] method is going to return a node. So the client method put of course, just is supposed to do the association so it has a void return. But what we'll do is invoke a recursive method starting at the root and whatever link gets returned, we'll set that to root. So right away, we can see, let's suppose that we have an empty tree where root is null. So then if we put with null as the first argument, then null is the first argument. What we do is we say if, if the argument is null, return a reference to a new node that associates key with value and then that one has null links. So in this case, that first call will return a link and whatever link gets returned, that will be set to root. So, without any extra special code we insert a node into an empty tree. And that works, again, recursively say we have one node in the tree, and we have a new key to associate. And let's say that key is less than the key at the root. So, so now we do put in it's actually a link to that one node that's got two null links. So it's not null so we'll compare our key against the key in that node. If that comparison comes out left, here's how we do the insert. We change our left link which is right now it's null to whatever put returns. So what's put going to return? Well, that left link is null, so what's going to happen is, in that call x is null. It's going to be cre ated a new node and the link to that node will be returned and that's the link that we'll put in the left. This is a very concise code that otherwise we'd have various cases about saving which link we went down in order to reset that later on. So now the best way having looked at those small examples, the best way to understand this code is recursively. Let's believe that it works for small cases which we have just done. So, lets see what the code does. So if x is null, we want to create a new node and return the link to that node. So, even if it's a huge tree down at the bottom, we just came of a null link. We just want to create a new node with our new key and return a link to that node, that's all we want to do. Now, we can assume that put is going to return a link to a sub-tree that contains our new key and if our new key is smaller than the key at the node that, that we're processing now, then [cough] we want to insert the new key value there and the new node on the left otherwise, we want to insert on the right. Most of the time, the link that we get back will be same as the link that we put in but for the bottom node it will be different. So, if put works properly inserting a new node on the left, then that's what we want our left link to be. If it works properly, putting in the subtree on the right, that's what we want our right link to be. And by the way, if we find a key that's in the tree already, then we just want to reset the value. And in all of these cases where we're on a node that already existed, we just want to return the link to that node. Again, when we look at more sophisticated values we'll be returning something else. So it's worthwhile you know, checking that you believe that this code implements the simple binary search tree algorithm that we demoed where when we fall off a null link we created a new node and replaced that null link with the new node . So that's insert for a binary search tree in a symbol table. And again, the cost of this is the number of compares is equal to one plus the depth of the node. We just go down a path in the tree. Now, what's interesting about binary search trees is that there are many different binary search trees that correspond to the same set of keys. So the number it compares is going to depend on the order in which the keys come in. And that's a key feature of binary search trees that we'll come back to again when we look at more sophisticated data structures. So it depends on how the keys come in. The shape of the, of the tree could be well in the best case so it would be perfectly balanced. And one of the things we'll look at is algorithms that come very, very close to achieving that goal. The typical, typical case it'll be sort of balanced. Now but one problem is if the keys come in and, and really unfortunately, if they come in, in a natural order. Like if they come in, in order, that's the worst case. We don't get any benefit from having it in a tree shape. It's no different than a link list. So we'll, we'll come back to dealing with that worse case in the next lecture. But the point is, the tree shape depends on the order of insertion. Now, but let's look at what's happened or visualize what happens when keys come in, in random order. So the tree grows from the bottom in the little side to side motion it's just accommodating room for each new key as it's added. But you could see that even for this case which is hundreds of keys, the length of the path from top to bottom is not so much. In this case, the maximum distance from the top to the bottom is sixteen the average is only nine and the best you could in a perfectly balanced tree it would be seven. So it's pretty well balanced which means that our search and insert cost in this case for 255 keys is only going to be sixteen quite a bit less. So one remark before we do the analysis is that actually binary search trees correspond exactly to Quicksort partitioning. In the binary search tree, we have a node at the root and we have everybody smaller to the left, and everybody larger to the right. In Quicksort partitioning, after the random shuffling we have the partitioning element and then we process everybody to the left independently of everybody to the right. So it's a [cough] there's a direct correspondence. If there is no duplicate keys Quicksort processes them and referred them out in BSTs and if there's no duplicate keys there's a one-to-one correspondence between what happens with Quicksort and what happens with binary search trees. And we point that out because that helps with the mathematical analysis. In fact, this correspondence with Quicksort partitioning tells us we can take that proof and prove that if you insert in distinct keys into a BST, in random order, then the expected number of compares for a search and an insert is two natural log N. So again about 1.38 log base two of N almost the best that you could do. It also has been shown actually not that long ago, that the expected height of the tree if they're inserted in random order, the height that's the worst case length of a path in the tree. This is the average path in a tree, this is the, the worst of all the keys. This is about four natural log N. So, if you have the keys in random order the binary search tree gives efficient search and insert. Now but there is this problem that the actual worst case height if the keys come in, in order and reverse order and other natural orders that the time could be proportional to n. So, we have this summary which is looking pretty good, because we have the average case for both operations, the search and insert, to be 1.39 log N and that's probabilistic if they are in random order, its extremely likely to be there. But the problem by comparison with sorting is, we don't get to randomize the order the client is providing the keys. So we're going to need something better to provide the guarantee than just randomly ordering the keys. That's what we're going to be looking at next when we look at more efficient algorithms. But first we're going to look at the implementation of order and operations with the binary search tree structure. It expands like binary search to handle all these convenient client operations in a very natural manner. That's the implementation of binary search trees for symbol tables. They give sufficient implementation of both search and insert.