[MUSIC] So, now we will discuss how to implement sequence alignment in a more efficient way. Usually, we compare quite long sequences up to few tens of thousands of amino acids And runtime of alignment algorithms, as you know, is proportional to the number of edges, which is quadratic. And memory consumption, consumption of alignment algorithm is also proportional to the number of nodes, which is also quadratic. And memory is a, often bottleneck. Because on run time you can execute billions of configurations in one second, but to store billions of values in memory you need one or few gigabytes of memory. And, if you want to compare longer and longer, longer sequences, you still can wait for many seconds. When we, while your algorithms try to compute this alignment, but you don't have such huge amount of memory, which is needed, for algorithms. So how can we do this? Let's define middle column of our Manhattan grid as a column in the middle, and so it's quite simple. And we define middle node in Manhattan grid. It's a node where an optimal alignment path crosses the middle column. For example, in this grid, we have a, a, this highlighted node, it's a middle node. Because our optimal parts go through this node, and in this node is in the middle column. Let's design algorithm to find sequence alignment. We, first step, we need to find a middle node. Then, we need to find alignment between the source and middle node. And after that, we need to find alignment between middle node and the sink, since we know all parts of a path, we can do it recursively. So the only problem is how to find this middle node, because we don't know yet. Let's first discuss how to find alignment score in linear space. Previously, we used backtracking pointers to find the longest path, and it required quadratic memory. But what if we don't need this longest path, but we want to know only the score of alignment? Can we skip storing these backtracking pointers, and use less memory? Just to find alignment score without alignment itself. Let's take a look on our Manhattan grid and we have a first column and we know Aurelius in the first column, after that, we can compute all scores in the second column using the first column variables only. And that's the same as we previously discussed when describing alignment algorithm. But for third column, we don't need to know values of the first column, and we can simply forget them. And reuse this memory. Because for a third column, we can use only scores from the second column, and so on and so on and so on. And at each moment of this algorithms, we need to store only two columns. And we don't need to store all scores of all nodes in our Manhattan grid. And now we know how to find alignment score in linear space, but we want to find all alignment in linear time. So let's go back to our middle column and middle node. So, middle column was a column in the middle and middle node was a node where optimal paths crosses this middle column. Let's define i-paths. It's a longest path among all paths that visit i-th node in the middle column and for example, lengths of zero paths is two here, and lengths of four paths is four. And, let's try to compute these numbers. Lengths of all i-paths in the middle column. For example here, in this graph, it's two, three, three, four, three, and one. And four, our middle node, is a maximum node in this vector of lengths of i-paths. So, lengths of i-paths. We should be able to write down this length of i, is consists of from, fromSource i and toSink i. Which is two parts of our path from source to this node, and from this nord, node to the sink. And we can find all values from source i just by going through the first part of Manhattan grid, using our linear space algorithm to calculate alignments course results alignments itself. And we also can find a vector of failures from sink by going in the second part of Manhattan grid, but by course. So, we'll start at our sink, and it'll go by backward edges, trying to reach this middle column, also in linear space. So we will know alignments course, but we won't know alignments which led to this course. And after we have this two vectors, from source and to sink, we can sum them up and find values in a middle column. So how many, what is the run time of this algorithm So to find from source, we need to visit area divided by two of our whole Manhattan grid. And to find toSink vector of values, we need to also visit area divided by two. So we actually need to visit whole Manhattan grid. To find values in a middle node. So isn't it too large? Using the same run time as before, we found only middle node. So we found only single node, which belongs to our optimal path, and it required quadratic time. So, what can we do next? We can split our Manhattan grid into four parts and highlighted parts here should be visited after you've found middle node, because our optimal path stay somewhere in these areas. But we can skip other two areas from our considerations on the next steps of algorithms. So on each of this highlighted part, we need to find, we don't know, how many time we will take. The area of this node is proportional to the half of our area of whole Manhattan grid so, we need only half of the time which were required to find single middle node to, find two more middle nodes in this separated areas. So, after this step you will have three nodes which belongs to our optimal alignment paths and we can iterate more and more, so, on the next step we will split our area into smaller rectangles and we will find middle node in each rectangle, and so on and so forth. And on the third level of recursion, the total area which we need, to take a look on, is proportional to the area divided by four. And on the fourth step, it's even smaller and smaller. And during the whole algorithm. Firstly, we visit, we visit that whole area to find single node. Single middle node. And the next step, we visit that area divided by two to find more nodes in our optimal alignment path. And on the next step we see that area divided by four, and so on and so on. And this progression sum up to two times area. So it's actually linear that, linear run time of this algorithms. Linear by area of our Manhattan grid. You can see the illustration here, for example, for a first pass we need to take a look on that one area. On the second pass we need to take a look on that half of area. And, on the third pass. It's fourth, and so on and so on. And you can easy take a look here that it summed up to two. So what have we done? We designed algorithm which required only linear space. Because every step in this recursion requires linear space to find a middle node in a given area. And runtime of this algorithms is still linear to the, our, our number of nodes and number of edges in alignment graph. And by using this technique. Its a little bit slower but by using this technique, we can slow down a little bit of run time but still find whole alignment in linear space.