0:00

So, in this video, we'll go over the basics behind implementing binary search

Â trees. We are not going to focus on the balance aspect in this video that will be

Â discussed in later videos and we are going to talk about things which are true for

Â binary search trees on general, balanced or otherwise. But let's just recall, you

Â know, why are we doing this, you know, what is the raison d'Ãªtre of this data

Â structure, the balance version of the binary search tree and basically, its a

Â dynamic version of a sorted array. So, that's pretty much everything you can do

Â on a sorted array, maybe in slightly more expensive time. They are still

Â really fast but in addition to this dynamic, it accommodates insertions and

Â deletions. So, remember, if you want to keep a sorted array data structure, every

Â time you insert, every time you delete, you're probably going to wind up paying a

Â linear factor which is way too expensive in most applications. By

Â contrast with the search tree, a balanced version, you can insert and delete a

Â logarithmic time in the number of keys in the tree. And moreover, you can do stuff

Â like search in logarithmic time, no more expensive than binary search on a sorted

Â array and also you can sort of say the selection problem in the special cases,

Â the minimum or maximum. Okay, it's not constant time like in a sorted array but

Â still logarithmic pretty good and in addition, you can print out all of the

Â keys from smallest to largest and in linear time, constant time per element

Â just like you could with the linear scan through a sorted array. So, that's what

Â they're good for. Everything a sorted array can do more or less plus insertions

Â and deletions everything in logarithmic time. So, how are search trees organized?

Â And again, what I'm going to say in the rest of this video is true both for

Â balanced and unbalanced search trees. We're going to worry about the balancing

Â aspect in the later videos. Alright, so, let me tell you the key ingredients in a

Â binary search tree. Let me also just draw a simple cartoon example in the upper

Â right part of the slide. So, this one to one correspondence between nodes of the

Â tree and keys that are being stored. And as usual in our data structure discussions

Â we're going to act as if the only thing that we care about, the only thing that

Â exists at each node is this key when generally, this associated data that you

Â really care about. So, each node in the tree will generally contain both the key

Â plus a pointer to some data structure that has more information. Maybe the key is the

Â employee ID number, and then there's a pointer to lots of other information about

Â that employee. Now, in addition to the nodes, you have to have links amongst the

Â nodes and there's a lot of different ways to do the exact implementation of the

Â pointers that connect the node of the tree together but the video I'm just going to

Â keep is straightforward as possible and we're just going to assume that in each

Â node, there's three pointers. One to a left child, another one to the right child

Â and then the third pointer which points to the parent. Now, of course, some of these

Â pointers can be null and in fact in the five node binary search tree I've drawn on

Â the right for each of the five nodes, at least one of these three pointers is null.

Â So, for example, for the node with key one it has a null left child pointer, there was

Â no left child. It's the right child pointer going to point to the node with

Â key two and the parent pointer was going to a node that has key three. Similarly

Â three is going to have a null parent pointer and the root node in this case, three is a

Â unique node but has a null parent pointer. Here the node with key value three, of

Â course, has a left child pointer points to one and has a right child pointer that

Â points to five. Now, here is the most fundamental property of search trees.

Â Let's just go ahead and call it the Search Tree Property. So, the search tree

Â property asserts the following condition at every single node of the search tree.

Â If the node has some key value then all of the keys stored in the left subtree should

Â be less than that key. And similarly, all of the keys stored in the right subtree

Â should be bigger than that key. So, if we have some node who's stored key value is x

Â and this is somewhere, you know, say deep in the middle of the tree so upward we

Â think of as being toward the root. And then if we think about all the nodes that

Â are reachable, after following the left child pointer from x, that's the left

Â subtree. And similarly, the right subtree being everything reachable via the right

Â child pointer from x, it should be the case that all keys in the left subtree are

Â less than x and all keys in the right subtree are bigger than x. And again, I

Â want to emphasize this property holds not just to the root but at every single node

Â in the tree. I've defined the search to a property assuming that all of the keys are

Â distinct, that's why I wrote strictly less than in the left sub tree and strictly

Â bigger than in the right subtree. But search trees can easily accommodate

Â duplicate keys as well. We just have to have some convention about how you handle

Â ties. So, for example, you could say that everything in the left subtree is less

Â than or equal to the key at that node and then everything in the right subtree

Â should be strictly bigger than that node. That works fine as well. So, if this is

Â the first time you've ever heard of the search tree property, maybe at first blush

Â it seems a little arbitrary. It seems like I pulled it out of thin air but actually,

Â you would have reversed engineer this property if you sat down and thought about what

Â property would make search really easy in a data structure. The point is, the search

Â tree property tells you exactly where to look for some given key. So, looking ahead

Â a little bit, stealing my fire from a slide to come, suppose you were looking

Â for say, a key 23, and you started the root and the root is seventeen. The point

Â of the search tree property is you know where 23 has to be. If the root is

Â seventeen, you're looking to 23, if it's in the tree, no way is it in the left

Â subtree, it's got to be in the right subtree. So, you can just follow the right

Â child pointer and forget about the left subtree for the rest of the search. This

Â is very much in the spirit of binary search where you start in the middle of

Â the array and again, you compare what you're looking for to what's in the middle

Â and either way, you can recurse on one of the two sides forgetting forevermore about

Â the other half of the array and that's exactly the point of the search tree

Â property. We're going to have to search from root on down, the search tree

Â property guarantees we have a unique direction to go next and we never have to

Â worry about any of the stuff that we don't see. We can also draw a very loose analogy

Â with our discussion of heaps and may recall heaps were also logically, we

Â thought of them as a tree even though they are implemented as an array. And heaps

Â have some heap property and if you go back to review the heap property, you'll find

Â that this is not the same thing as the search three property. These are two

Â different properties and that's going to trying to make different things easy. Back

Â when we talk about heaps, the property was that this is for the extract min version.

Â Parents always have to be smaller than their children. That's different than the

Â search tree property which says stuff to the left, that's smaller than you, stuff to

Â the right is bigger than you. And heaps, we have the heap property so that

Â identifying the minimum value was trivial. It was guaranteed to be at the root. Heaps

Â are designed so that you can find the minimum easily. Search trees are, are

Â defined so that you can search easily that's why, you have this different search

Â tree property. If you want to get smaller, you go left. If you want to get bigger you

Â go right. One point that's important to understand early, and this will be

Â particularly relevant once we did, once we try to enforce balancing in our subsequent

Â videos is that, for a given set of keys, you can have a lot of different search

Â trees. On the previous slide , I drew one search tree containing the key values one,

Â two, three, four, five. Let me redraw that exact same search tree here. If you stare

Â to this tree a little while you'll agree that in fact that every single node of

Â this tree, all of the things in the left subtree are smaller, all of the things in

Â the right subtree are bigger. However, let me show you another valid binary search

Â tree with the exact same sets of keys. So, in the second search three, the root is

Â five, the maximum value. And everybody has no right children, only the left children

Â are populated and that goes five, four, three, two, one in descending order. If

Â you check here again, it has the property that at every node, everything in the left

Â subtree is smaller. Everything in the right subtree, in this case, empty, is

Â bigger. So, extrapolating from these two cartoon examples, we surmised that for

Â a given set of n keys, search trees that contain these keys could vary in height

Â anywhere from the best case scenario of a perfectly balance binary tree which just

Â going to have logarithmic height to the worst case of one of these link list like

Â chain which is going to be linear in the number of keys n. And so just to remind

Â you the height of a search tree which is also sometimes called the depth is just

Â the longest number of hops it ever take to get to from a root to a leaf. So, in the

Â first search tree, here the height is two and then the second search tree, the

Â height is four. If the search tree is perfectly arranged with the number of

Â nodes essentially doubling at every level, then the depth is you're going to run out

Â of nodes around the depth of log2n. And in general, if you have a chain of n keys

Â that that's going to be n - 1 but we'll just call it n amongst friends. So, now

Â that we understand the basic structure of binary search trees, we can actually talk

Â about how to implement all of the operations that they support. So, as we go

Â through most of the supported operations one at a time, I'm just going to give you

Â a really high level description. It should be enough for you to code up on

Â implementation if you want or as usual, if you want more details or actual working

Â code, you can check on the web or in one of the number of good programming or

Â algorithms textbooks. So, let's start with really the primary operation which is

Â search. Searching, we've really already discussed how it's done when we discuss

Â the search tree property. Again, the search tree property makes it obvious how

Â to look for something in a search tree. Pretty much you just follow your nose

Â you have no other choice. So, you started the root, it's the obvious place to start.

Â If you're lucky, the root is what you are looking for and then you stop and then you

Â return to root. More likely, the root is either bigger than or less than the key

Â that you're looking for. Now, if the key is smaller, the key you are looking for is

Â smaller than the key of the root, where you're going to look? Well, the search

Â tree property says, if it's in the tree, it's got to be in the left subtree so you

Â follow the left sub child pointer. If the key you're looking for is bigger than the

Â key at the root, where is it got to be? Got to be in the right subtree. So, you're

Â just going to recurse on the right subtree. So, in this example, if you're

Â searching for, say the key two, obviously you're going to go left from the root. If

Â you're searching for the key four, obviously you're going to go right from

Â the root. So, how can the search terminate? Well, it can terminate in one

Â of two ways. First of all, you might find what you're looking for so in this

Â example, if you search for four, you're going to traverse to right child pointer

Â then a left child pointer and then boom, you're at the four and you return

Â successfully. The other way the search can terminate is with a null pointer. So, in

Â this example, suppose you were looking for a node with key six, what would happen?

Â Well, you start at the root, three is too small so you go to the right. You get to

Â five, five is still too small cuz you're looking for six so you try to go right but

Â the right child pointer is null. And that means six is not in the tree. If there was

Â anywhere in the tree, it had to be to the right of the three, it had to be to the

Â right of the five but you tried it and you ran on the pointers so the six isn't

Â there. And you return correctly with an unsuccessful search. Next, let's discuss

Â the insert operation which really is just a simple piggy backing on the search that

Â we just described. So, for simplicity the first think about the case where there are

Â no duplicate keys. The first thing to do on this insertion is search for the key k.

Â Now, because there are no duplicates, this search will not succeed. This key k is not

Â yet in the tree. So, for example, in the picture on the right, we might think about

Â trying to insert the key six. What's going to happen when we search for six, we

Â follow a right child pointer. We go from three to five and then we try to spot

Â another one and make it stuck. There's a null pointer going to the right of five.

Â Then when this unsuccessful search terminates at a null pointer, we just rewire

Â that pointer to point to a node with this new key k. So, if you want to permit

Â duplicates from the data structure, you got to tweak the code and insert a little

Â bit but really barely at all. You just need some convention for handling the case

Â when you do in counter the key that we are about to insert. So, for example, if the

Â current note has the key equal to the one you're inserting, you could have the

Â convention that you always continue on the left subtree and then you continue the

Â search as usual again, eventually terminating at a null pointer and you stick

Â the new inserted node you rewire to null pointer to point to it. One good exercise

Â for you to think through which I'm not going to say more about here is that when

Â you insert a new node, you retain the search tree property. That is if you start

Â with the search tree, you start within tree where at every node stuff to the left

Â is smaller, the stuff to the right is bigger. You insert something and you

Â follow this procedure. You will still have the search tree property after this new

Â node has been inserted. That's something for you to think through.

Â