0:31

node i is equal to height of the tree rooted at vertex i.

Â However, it is still true that the rank is an upper bound.

Â On the corresponding height.

Â Let me illustrate this with a toy example.

Â So, I assume that we have a tree like this.

Â So, this is root say vertex 5.

Â Here we have vertex 2 and

Â we have node 3, node say 6.

Â Assume that currently rank of this node is 2 say 0, this is 1 and this also 0.

Â We now recall find of 6.

Â This will re-attach 6 to the current root of this tree.

Â So what we get is the following,

Â 5, 3, 1.

Â And also 6 now stays here, all right?

Â So we see that the height of these three is equal to 1.

Â However the rank of the root is still equal to 2.

Â Recall that find doesn't use and doesn't change any rank values.

Â Also for this node 3, it's height.

Â The height of the tree rooted at element 3 is equal to 0,

Â however the rank is equal to 1.

Â Well, intuitively it is clear path compression can only decrease the height.

Â So for this reason, rank is no longer equal

Â to the height of the corresponding tree however the height is at most the length.

Â Okay.

Â Another important thing is the following.

Â It is still true that for any root node i of rank k.

Â The corresponding sub 3 contains at least 2 to the k elements.

Â And this can be seen by realizing that the past compression does not affect

Â any root nodes.

Â I mean, if we have a root node whose height is k,

Â then no matter how many times and for which nodes in these sub tree

Â we call find with path compression, all this nodes are still in

Â this subtree rooted at this nodes exactly because this node is a root.

Â So any node from this subtree cannot be attached

Â to some other vertex and some other subtree.

Â This node is still, so once again if we have a node who is still a root and

Â whose rank is k, then the corresponding subtree contains at least 2 to

Â the k elements, 2 to the k nodes.

Â On this slide we discuss a few more important properties.

Â The first one of them says that we have,

Â in our forest, at most n divided by 2 to the k nodes of rank k.

Â Why is that?

Â Well, recall that if we create a new node of rank k

Â 4:13

By saying too many, I mean that its number is greater than n divided by 2 to the k.

Â Then overall, we have more than n element.

Â Which is a contradiction, right?

Â The second property is that, when we go up,

Â the rank strictly increases.

Â Well, this was clear if we do not use past compression.

Â I mean, if rank is equal to the height of the corresponding subtree,

Â then this is completely clear.

Â Let me recall that if we have for example a tree

Â 6:26

We now start to estimate the running time of M operations.

Â First of all note that the union operation

Â boils down to T(all calls to FInd) operation.

Â And also to some constant operations.

Â Namely, when we have two roots that were found by two calls to.

Â To find that operation,

Â we need to hank one of them below other one which is a constant operation.

Â We just need to change one parent.

Â And also possibly we need to change the rank value.

Â Okay.

Â So for this reason when estimating the total running time we will just assume

Â that we have m calls to find operation.

Â Paths node that each Find operation traverses some

Â pass from a note to find the root of the corresponding tree.

Â So we traverse some number of edges.

Â So the total run in time of all the defind operations,

Â of all the calls to defind operation is just the total number of edges traversed.

Â So this is what is written here, we just need to count the number

Â of edges from parent a node i through it's paring j,

Â at traverse you know these codes.

Â For technical reasons we will split this number into three terms.

Â In the first term we will account all the edges that

Â lead from a vertex to the node to the root of the corresponding tree.

Â So the first term includes all the edges that lead from

Â a node to another node, which is the root in this case.

Â The second term include all the remaining edges where we go from i to j.

Â Such as there log* of rank of

Â a is strictly smaller than log* of rank of j, okay?

Â And their remaining term accounts for everything else,

Â namely for all the edges where we go from i to j such that j is not the root.

Â And that log*(rank[i]) = log*(rank[j])).

Â We're going to show separately for each of these terms that it

Â is upper bounded by big 0 of m multiplied by log star of m.

Â Let me show this on a small example as usual.

Â Assumes that we have such a path that we're going to traverse.

Â A path from the node at the bottom to the root of the corresponding tree.

Â So the numbers shown here indicate the ranks of the corresponding nodes.

Â Then these two nodes, these two edges will be accounted in the first term.

Â Just because these two nodes lead from a node to the root this thread just lead

Â from a node to the root of the corresponding tree.

Â Well, this edge for example will be accounted in the last

Â term because the rank of this node is 17 and

Â the rank of this node is 21.

Â And log star of this numbers are equal.

Â At the same time, here we have 14 the rank 14, and here we have rank 17.

Â And the log star of these two numbers are different.

Â For this reason this hatch will be accounted in the second term, okay?

Â So on the next sequence of slides,

Â we are going to estimate separately each of these three terms.

Â And for each of them, we are going to show that it is at most big O of m

Â multiplied by log* of m.

Â The first term is easy to estimate.

Â Recall that in this term we account for all the edges traversed by find operation.

Â Where we go from node i to its parent j such that j is the root.

Â Clearly for each call to find the operation

Â there are at most two side edges, right?

Â Which means that we have an upper bound big O of m.

Â In the second term, we need to estimate that a long number of edges traversed

Â it during all m calls to the find operation.

Â Such that we go from node i to its parent j such that j is not the root and

Â also log star of rank of i is strictly less than log star of rank of j.

Â We're going to prove here that it is upper bounded by big O of

Â m multiplied by log star of n.

Â 11:27

the root of the corresponding tree, the rank always increases.

Â However, the rank of the root is at most log n, which means that during one call

Â to find procedure the lock star of rank can only increase log star of n times.

Â Okay.

Â This is just because we've had an upper bound for the rank of the root.

Â It is upper bounded by log m which means that there

Â are only log star of m different possibilities for

Â log star of rank of folds and nodes on this.

Â Which means, that these can only increase, at most, log star of m times.

Â And we have at most, m calls to find the operations.

Â Which gives us, an upper bound m, to get, m multiplied by log star of m.

Â Now it remains to estimate the last term.

Â Where we account for

Â all the address traversed during m calls to the find operations.

Â Where we go from node i to node j through its parent j such that j is not the root,

Â first of all.

Â And then the rank, the log star of rank of i is equal to log star of n of j.

Â What we're going to show is that the total number of side edges

Â is upper bounded by big O of m multiplied by log star of m.

Â Note that this is even better than what we need.

Â What we need is a recap upper bound which is m log star of n.

Â Recall that we know that m is greater than m just because m is the total number

Â of operations, while n is the number of calls to make said operations.

Â To estimate the required term, consider a particular node i and assume for

Â completeness that it's rank lies in an interval from k plus one to two to the k.

Â Recall that this was the form of interval

Â through which the lock star function is fixed.

Â Okay?

Â 14:42

And this in turn means that after most 2 to the k calls to find of i.

Â Find(i) will be adopted by a new parent

Â whose rank, for sure, does not lie in this interval.

Â Just because the rank of this interval is at most 2 to the k.

Â So if we increase the rank of the parent of i,

Â at least 2 to the k times, it will be greater than 2 to the k for sure.

Â Right.

Â We now have everything to conclude this proof.

Â To conclude estimating the required term.

Â Recall that we are estimating the total number of address traversed

Â during n calls to client.

Â Where we go from a node i to its parent j,

Â such that log star of rank of a is equal to log star of rank of j.

Â So we have just proved that there are at most n divided by 2 to the k nodes

Â whose rank lies in the interval from k plus 1 to 2 to the k.

Â We then explained why Any such node whose rank lies in such an interval,

Â contributes, at most, 2 to the k to the term that we are currently estimating.

Â This means that just all the nodes from, whose rank lies in such an interval.

Â Contributes, at most big O of N to this term, right?

Â Now it remains to note that the total number of different intervals

Â is at most log star of n.

Â Which means that the total contribution of all such nodes i to the estimated

Â term is at most big O of n Multiplied by log star of n.

Â 16:31

We now conclude the whole lesson.

Â So in this lesson we considered it an implementation of the joins of

Â data structure where each set is represented by a tree.

Â And for this reason we have a forest of trees.

Â The ID of each set is just the root of the corresponding tree.

Â We then consider it to heuristics.

Â The first one is called Union by rank.

Â It says that when merging two trees, it makes sense to attach

Â a shorter one to the root of a taller one, okay?

Â This helps to keep trees in our forest shallow.

Â And for this we keep the rank of each subtree rooted

Â at a particular node in a separate array called rank.

Â 17:29

The idea is the following.

Â When we traversed a path from a node to the root of the corresponding tree,

Â we may want to reattach all the nodes found on this path directly to the root.

Â Because this potentially will save us time for further find operations, right.

Â We then proved that the resulting implementation turns out

Â to be extremely efficient,

Â namely the amortized running time of each operation is big O of log star of n.

Â And log star of n is an extremely slowly growing function.

Â In particular, log* n is at most five for all practical values of n.

Â