Now we look at parallel scan operations on collections of elements. We have seen several operations on collections and how to implement them in parallel so far. We have seen the map that applies a given function to each element of the list, producing a new list of the same length. We have seen fold and it's varied entry use, it takes the least of elements, and combines all the elements with a given operation. And now we consider scan. Scan operation on ordered collections of elements has aspects of both map and fold. Like map, what we will obtain is another collection. Suppose we have List, the collection of elements containing 1, 3, and 8 in that order. Let's do scan with the initial element 100, and the binary operation of addition. The result is the list that has one element more than the original list. The very first element will always be the initial element that we were given. And each subsequent element is obtained by using the given operation and the corresponding element of the input list. So we start with our initial element and then to obtain next one, we'll just add the first element of the list because our operation is addition. We will then add the second element of the list. And finally, the last element of our list. Slightly more symbolically, given a list consisting of elements, a1, a2, and a3. When we apply scanLeft with initial left a0 and some binary operation f, the result is the list b0, b1, b2, and b3 that's computed as follows. b0 is just a given initial element, and then b1 is computed by taking b0, combining it using f with a1. Then we take b1, combine it with a2 to obtain b2. Then we take b2, combine it with a a3 is b3. Now throughout this segment, we will assume that operation f is associative. Even in that case, there exists a dual operation, scanRight which is different from scanLeft. If for example, we take the same List(1, 3, 8) and apply scanRight with initial element 100, and an associative operation of addition, the result will be the following list. Note that it is the last element of the list that is equal now to the given initial element. And then each previous element is obtained by adding the element at the corresponding position of the input list. Everything we will say today about scanLeft will also apply to scanRight dually. So when we have a list of n elements, then scanning will produce a list of N plus 1 elements, here indexed from 0 to N. Such that the first element is given by the initial element, and then subsequent elements are obtained using this equation. So now you can give a sequential definition of scanLeft. Let's consider scanLeft, whose input is an array, and we have then the initial element a0 and binary operation f. The results should be written into the output array out. And we will assume that the length of the output array, out.length, is at least length of the input array plus 1 so that we can fit all the elements that we need to produce. Here's one sequential solution. The first element of the output is a0 itself. And then we compute subsequent elements in a loop. The current value that we are writing is stored in the variable a and the index is i. So after a0, the first element computed will be f(a), which is initially a0 and then input of 0. That element will be written to out of 1. And then we continue in the loop. You can see that there is one more write than read from the array. Can we make scanLeft parallel operation? Remember, we are assuming that f is associative. We have a somewhat ambitious goal of coming up with an algorithm that still runs in all log n, even infinite parallelism. At first, this task seems almost impossible, because the value of the last element in sequence is computed from the previous element. And for every element, it looks like the natural way is indeed what we gave in the sequential algorithm. But even if we parallelize the individual applications of f, we would not be able to parallelize the traversal of the array itself. So this would give us still a linear algorithm even with infinite parallelism. So, we need to perform computation in a different order, the idea is to give up reusing all intermediate results. And in fact, we will do more work and more applications of f that need the simple sequential version. However, this will allow us to improve parallelism and in terms of the parallel running time, more than compensate for the fact that we are applying f a few more times than in the sequential algorithm. To get intuition for why we could even do this in parallel, let's try to express scan using map and reduce for which we have previously seen parallel implementations. Assume that the input is given in the input array inp. And the boundaries of the segment that we are interested in are given by left and right. The initial element is a0, and the binary operation is f. Then reduce is going to be used in order to simply reduce that segment of the area A, and we can also use map which can apply an operation on a given array segment and write the resulting output array. We're going to use a slightly modified variant of map where the function that determines the mapping is given not only the element A, the value of the given point of the array, but also the index at which the value is stored. Can you implement scan left with an invocation of map and reduce? Here's one solution, this solution follows the definition of SCAD. So element in position i in the output, is the result of reducing the segment of the input array up to that position. Therefore, the resulting array will be obtained using a map over the input array where this function fi given to map is in fact going to reduce the array segment from 0 to i. The invocation of map will fill in the output array With elements, starting from 0 and ending in including input length- 1. We then just need to write the final element of the output array. That final element is computed by taking the element before the final and combining it with the corresponding element of the input array. If map and reduce are implemented in parallel, and they each have log and parallel complexity, then because map is applying all these individual operations in parallel, you can see that the overall depth is going to continue to be log N. In the previous solution, we did not reuse any computation across different elements of the output array. Each element of the output array was computed entirely independently of the other ones. A natural question is, whether we could reuse at least some of these computations even if we are not going to use the sequential computation pattern from before. Now recall, that reduce itself proceeds from applying the operations in a tree, in order to obtain parallel implementation for associative operation. So the idea is then, to save this intermediate results to save this tree and make use of it when computing the output collection. The fact that is tree is also good for making use of these values. To keep our function simple, we'll first assume that the input collection itself is also a tree. This is going to be a different kind of tree compared to the tree that saves the intermediate results. Here are the definitions of our trees. The input collection will be given using tree of this kind where all the values are stored in leaves, and what nodes have is just references to left and right subtrees. In contrast, the tree storing intermediate values will have an additional res field, even for the internal nodes of the tree. We will do that by having a common res field in the super class, and this field will then exist not only in the leaves but also in the nodes. Can you then define reduceRes function that transforms our Tree values into TreeRes values by applying a given binary operation. Here's the signature of reduceRes. The implementation of a reduceRes is very simple. Leaves map to leaves with the same value. And nodes invoke the reduction on left and right subtree with the same operation. And then, we build the resulting node. The resulting node has these left and right subtrees, this component. But it also needs to store the new value. In order to obtain the new value, all we need to do is apply the given binary operation to the results of the left and right subtree. Let's take a simple example. This is a balanced tree that stores four integers, 1, 3, 8, and 50. And if we compute reduceRes on this tree, we obtain a new tree that has the values of the corresponding sums if we give plus as the operation. So, you can see that the resulting tree has the same shape as the original tree, we just have these additional values. And the root of the overall tree is in fact the value of reduce on our initial collection. Of course, we would like to do this computation in parallel, but all we need to do in order to accomplish that is to insert this parallel keyword in front of the two recursive invocations. The resulting function, we will call upsweep. This suggests the bottom up computation that we use in order to obtain the tree of results. Given this tree with results, we would now like to produce the scanLeft of our initial collection. For the collection 1, 3, 8, 50 and the initial element 100, the scanLeft should be the following list. The computation of scanLeft, given the result tree, is called downsweep. Downsweep takes an initial element a0, which plays an important role here. And the tree of results. And the binary operation f. It will produce a new collection, just like the bigger tree in the end. We will first produce a tree that has the same length as the original one. To understand how downsweep works, the key fact to remember is that a0 is supposed to denote the reduce of all elements that come to the left of the current tree, t, that we are given. At the very beginning, this is the initial element, 100. As we move down the tree, then we get some elements that proceeded. So, for example, for this subtree here. What we need to take into account is 100, 1, and 3. When we have the Leaf, then, we simply need to apply operation f to the given element a0 and the element in the Leaf. The interesting case is of course case of Node. We are going to again recursively do a downsweep on the left and right subtree. This will give us two new trees, left and right, and then, we will combine it. The key question is, what is the initial element that we are passing to these two subtrees? Now, what are the things that are to the left of our left subtree here? Well, they are the things that they are left to the entire tree. So, we are passing the same element a0. What about the right subtree? Here, we need to take into account both the elements that we are given, a0. But also, what happened in the left subtree. Let's see how this works on our example tree. Given that initially a0 is 100, then this value of 100 will be passed to the left subtree. This will then be passed to the leaf and we will compute the result, 101. On the other hand, to the right subtree, or this subtree, we will be parsing the result of combining 100 with the value of the left subtree. We will be parsing 101 here. Therefore, when we reach to leaf 3, we'll compute 104. Now, let's step back and look what happens in the right subtree of the big tree. There we are going to pass the combination of a0, which is 100, and left.res. And left.res is already stored up front in our result tree. So, we did not need to wait for this computation in order to know what 4 is because this was computed in a separate pass. That mean that it will be passing 104 down here as the initial element. This will also become initial element in the left of the tree and the result will be 112 and for the right of the tree we will combine this 104 with the result here so it will be passing 112 here, and the result would be 162. So we see that we have computed precisely the suffix of scan left. And all we need to do is to add this initial element at the very beginning of the tree. Now, this is then the implementation of scan left on tree's. We have seen how to compute this tree average result which was the result of the function of three, then we have seen just now the downsweep function that, given the tree of results, produces the tree that is just missing the initial element. All we need to do is prepend that initial element. Can you define prepend? Prepend is just an operation on a binary tree. If we do not worry for the moment about balancing, then for leaf, we just create a node that contains a new leaf, namely the element we are prepending, and for the node, we are prepending only in the left subtree, leaving the right subtree as it was. And this completes the definition of scanLeft because upsweep and downsweep were parallel. And we have only two of them. And because prepend is just logarithmic operation. If we have approximately balanced tree, we have a good parallel running time. scanLeft on trees that we just presented was formulated to make the code as simple as possible. To make the implementation more efficient, in practice we would not store individual elements in the leaves of the tree, but use chunks of elements represented using RA for example. I leave as an exercise to define scanLeft on trees whose leaves store arrays. Now we will go further and examine parallel scan that is applied to a collection represented in an array. So we will have just one big array to start with. Interestingly, even in this case we will use a tree to store our intermediate results. The tree of intermediate results looks very similar to the tree we had before. You can see that the node stores left and right sub-trees, as well as the value that we have computed for them. On the other hand, there is a small change in the leaf. Because we want to be able to process our as efficiently we want to stop when the trunks little bit of processing are small enough and we will represent these chunks using in this case from and to. We will not store the actual content of those arrays You just store these in the decision to the big array and pass the reference to that array. So we're not storing values at all really. We're just storing the indication of where those values can be found. Given this definition of three of results for the array, here's then the implementation of app three Upsweep has the base case, and the recursive case. And the base case as usual is involved when the segment we are processing is sufficiently small. When the boundaries of the segment are given. From in doing this, this input the I and P. Given small enough segment, we're just going to apply or reduce sequentially. We will then store the value reduce as well as the indices from and to. We will see the implementation of reduceSeg1, which for the purpose of this algorithm can be done sequentially. So what do we do in the recursive case? We split the array segment into 2. Approximately in the middle, then we evoke upsweep on the two smaller segments. We will get the two tree area resulting traits, and then we need to update also the result of our victory by applying function f. So this computation looks very similar to the original one. It's just that we are starting from array as an input which we implicitly view as a tree. So here is then the reduced segment one implementation that works on array segments. And because we use it here in the base case. We just have this sequential implementation. Even though this implementation takes a0, the initial element, the way we have in fact invoked it was with input element of the array at the initial element of the segment. And then reducing starting from the subsequent segment. So that has the same effect as having a reduce without the initial element by taking all the elements between and including from and and up to 2 minus 1. Here's the downsweep on the array. It's going to take this tree of results that we just computed. And then we have the case for Leafs and for Nodes. When we've reached a Leaf, we are going to do sequential computation, as usual, now all we have are pointers, to the array, but we have this input array as well. So we are going to do sequential scan left for this segment between from and to. And we will write the result to the corresponding indices of the output array. Let's see the scanLeft, can see that there's no point doing anything if left is not less than right. And otherwise we do the usual scan computation, implemented sequentially. That we have seen before. Now given that scan left segment we have the implementation of our base case what about the recursive case, the recursive case we have two sub trees left and right so we're just going to do down sweep for these two sub trees here we are invoking one in the left sub tree, here on the right sub tree This is all done in parallel. Again, as we before as we're processing the tree of results. The initial element for the left of tree is our own initial element a0. Whereas the initial element for the right sub tree is a0 combined with the result of reducing the left symmetry Now that we have seen up sweep and down sweep, we can implement the entire scan left on arrays. We first invoke up sweep on the input array becoming zero and input length. Then we do the down sweep. Down sweep is going to write the results into the output array. The writing itself is performing the base case. Let's examine again this scan left segment. You can see that, again output is written to the elements that are one more because of this increment here, one more than the input. Because of that, let's now go back to the overall algorithm we will end up reading all the elements found input array and writing them shifted by 1 to the output array. The only thing that remains to do is to prepend this element a0 which in this case because the indices were set up appropriately, amounts to simply Writing a0 to the index 0 of the output array. And this completes the description of the parallel scan on the array.