In this session, we're again looking at proving things about functional programs. More specifically, we're going to prove laws about data structures using structural induction on trees. You know structural induction on lists from the first half of the course. But structural induction is in fact not limited to lists. It applies to any tree structure. The general structural induction principle is like this. To prove a property P(t) for all trees t of a certain type. Show that P(I) holds for all leaves I of a tree. And for each type of internal node t with subtrees s1...sn, show that P(s1) and P(sn) and so on implies P(t). So if you have a tree with leaves here and some internal nodes, something like that. Then the structure induction principle says, well, prove it for all leaves. And assuming that you have already shown it for the subtrees, let's say for this one and this one show it for the interior nodes as well. So if you can do that then, you have established the principle for all trees. So to demonstrate that proof principle, let's go back to IntSets. Recall our definition of IntSet. There was an abstract trait, with two operations include and contains. And that was implemented either by an object Empty. Where you see the definition of contains and include like that. Or by an object NonEmpty, where you saw the definitions of contains and include like that. For NonEmpty object we always had a case distinction. Where the element to look up, or to include is less than or equal or greater than the element on the right. And depending on that, we would then proceed with the left or the right subtree. So the invariant of these sets was that, elements are ordered. In the sense that if you have a tree here, and so let's say Ellen because 8. And then we have some subtrees here and some sub trees here. Then all elements in the left subtree would be less than the element at the roots or less than 8 in this case. And all elements on the right subtrees would be greater than that root element. So that's a class hierarchy that claims to implement IntSet. But how do we know it's correct? Or what does it even mean to say an implementation is correct for IntSets? Whereas the specification of IntSet? And for that, it's good to go back to the introduction of the very first week of this course. Where we talked about paradigms. And why functional programming is based on theories. And there we set a theory is characterized by some values, operations on these values, and laws that relate the operations. So we can do that for IntSet. And then if you have laws for IntSet, then one way to define and show the correctness of an implementation means. Proved that the law of IntSets holds for this implementation. So what are the laws of IntSets? I propose those three here. So the 1st sets says, Empty contains(x) is always false. The Empty set contains nothing. The 2nd law says, s include(x) contains(x) is true for any set s. So if we include an element in the set and then ask whether the set contains the same element, then that's true. And the 3rd law says that s include(x) contains(y) is s contains (y) if x is not equal y. So that means if I include an element x in a set. And then ask whether the set contains another element different from the included element. Then that's the same as asking whether the original set contains that other element. So these laws obviously make all sense of IntSets. And furthermore one can convince ourselves that these laws completely characterize what makes up an IntSet. If we restrict ourselves to just the operations contains and includes. So we now proceed to prove these three laws using structural induction on trees. The material that follows is a bit more on the theoretical side. If you're more practically minded then you might want to just essentially quickly browse it or skip it altogether. And continue with the next session after. So, let's go on and prove these laws. 1st law is Empty contains x secrets false. Well, that's obviously true because that's how contains was defined on Empty. So according to the definition of contains, that's false. The 2nd law is, s include(x) contains(x) equals true. We proved that by structural induction on the tree s. So the base case is s is Empty. Then we have to prove Empty include(x) contains(x). So that's by expanding include that gives us NonEmpty(x) Empty Empty contains(x). And then by the definition of contains for NonEmpty trees, that is true. So let's now prove the induction step. The induction step means that our set s here, is of the form NonEmpty. Than some root element some left tree, some right tree. And there's several cases to consider. The first case we consider is that the root element x and the element we include is the same one. So we have this expression here, and that one simplifies directly. Because including an element in a set that already has that element at the top is to set itself. And asking whether this set contains(x) is according to the definition of contains true. So the law holds for that case. But there was just one of the possible cases for the induction step. Let's look at the other. So the next case would be NonEmpty( y l r) for the set s where y is less than x. What do we have in that case? So we have NonEmpty(y l r) include(x). What does that give? Well, according to the definition of include, it means that if we include(x) in a set where x is greater than the root element. We recursively include it on the right subtree and reform the set with the remaining elements. So that's what you see in this expression here. And we ask whether that expression contains(x). Now we look at the definition of contains. Definition of contain says, well, if x is greater than the root element y then look in the right subtree. So that would give us r.incl(x), contains(x). And now we can use the induction hypothesis. Because we allow to assume that on each of the two subtrees of this NonEmpty set the condition holds. So we know for the tree r that r include(x) contains(x) that's the second law is true. And that means that the same holds for the NonEmpty tree here. There's a third case to consider. Namely that we have a set of the form NonEmpty(y, l, r) where now y is greater than x. And that one is of course completely analogous, so I won't repeat the reasoning here. You can try it for yourself if you want. So the last proposition is this one here. So, if we have a set, let's call it xs, and an element x. We gets different from y, then xs include(y) contains(x) is the same as xs contains(x). So, asking whether a set contains an element after another element is included is the same as asking whether the original set contains that element. So we proved that law by structural induction on the set xs. The two cases to consider if x is different from y than either x is smaller than y or larger than I. And we just assume the 2nd case that x is larger than y. The case where x is smaller than ls, ys again, completely analogous. So let's look at the base case. We have Empty include(y), contains(x). To show that Empty contains(x). That's the right hand side here. So let's expand include that would give us NonEmpty(y, Empty, Empty contains(x). And that means we have to look in the right subtree. So it's Empty contains(x). And that is our right hand side. So that's what we needed to prove here. So we've established the law for the case Empty. For the inductive step, it gets a bit more complicated. Here we need to consider a tree NonEmpty(z, l, r) where the elements z and the left and rights up trees, l and r are arbitrary. And now it depends on where z is relative to y and x. So we have y, and we have x. And essentially the z could be anywhere here. It could be less than y, it could be between y and x. It could be greater than x, or it could be one of y and x. So that gives us five cases. Z is x, z is y, z is less than y, z is between y and x. And z is greater than x. So let's tackle each of these five cases in turn. The first case is z equals x. So we have NonEmpty(x, l, r) include(y) contains(x). And we have to show that this is the same as NonEmpty(x, l, r) contains(x). So we know that y is less than x. So we construct a left subtree with y included and wrap the NonEmpty node around. That's what that looks like. And we ask again whether it contains(x). Now the answer is true of course. And that's also the answer if we ask NonEmpty( x, l, r) contains (x). So we have established the equality. What if z equals y? Then our induction step would look like this. And we have to show that NonEmpty(y, l, r) contains(x). And that follows directly by the definition of NonEmpty.include because that's how include is defined. If we include an element that's already in the root, then we return the original set. So we're done for the two simple cases z equals x, and z equals y. So what about the case where z is the smallest of the three elements? So z smaller than an y. So we now have NonEmpty(z,l,r) where z is smaller than y and there's always y is smaller than x. That's what our initial expression looks like. And we have to show that, that's equals to NonEmpty(z, l, r) contains(x). So by definition of include we now include y in the right sutree, because y is greater than z. That gives this tree here. And we ask whether it contains x. By the definition of contains z is smaller than x as well. So we look in the right subtree that gives this expression here. And by the induction hypothesis, that is r.contains (x). And if we compare that to the right hand side, NonEmpty(z, l, r) contains(x), then we see that again have equality. Because that's how this contains is defined on that node x is greater than z. So we look in the right septry. So we have established a four step equality between the left hand side of the law and the right hand side of the law. So the second case for the induction step to consider is where z is between y and x. So we have the same left hand side as before. And again we have to show that NonEmpty(z,l,r) contains(x). So since y is now smaller than z, we include y in the left subtree, and rapidly NonEmpty as you see here. And we ask whether it contains x. So that's the same as right contains (x). Because x is greater than z. And that's the same as NonEmpty( z, l, r) contains(x) by the definition of contains. Since x is greater than z, we search in the right subtree. So, this expands to that term over here. And the third step is the case where z is now greater than both y and x. So we have again the same left hand side. And we need to prove again that NonEmpty(z, l, r) contains(x) in this case. If we do that, then we proceed as before. In the first step we know that include will expand to an inclusion of y in the left subtree. And we asked contains(x), and x is also less than z. So to ask whether this set contains x, we look again in the left set. So we have l include(y) contains(x). Now by the induction hypothesis, that's the same as l contains(x). And again by the definition of NonEmpty contains in reverse, that's the same as NonEmpty(z, l, r) contains(x). These are all the cases, so the proposition is established. So, if you want to explore this further, here's a harder optional exercise that you might consider. Let's suppose we add a function union to IntSet. We have seen it already. Here's a possible implementation of union. So, for the NonEmpty set we say, to take the union of the NonEmpty set with some other set. We take the left set union rights at union, the other set, and then we re include x at the end. So what does it mean to say that this or any other implementation of union is correct? Well, we need to have a law for union away to relate it to the other operators that we have. There's one quite obvious law for union, it's this one here. To say, well, if we take the union of two sets and then ask whether that contains an element x. And that should be the same as asking whether the left operate contains x or the right operating contains six. So, your task is to show this proposition by using structural induction on the set xs.