Now we're going to modify our edit distance function from the last practical to instead do global alignment. So, if you recall, in the last practical for edit distance we have a penalty of one for skip characters and for mismatches. But in reality we might want to penalize certain kinds of errors more. For example, a skip character, we might want to penalize that more than just a simple mismatch. So we're going to do that. So I've already pasted in here my edit distance function, which I'm going to modify to global alignment. So to start with here, I'm just going to define the alphabet we're working with. And then I wanted to find a two dimensional array which will have all the penalties for the different mismatches we have. So I'm going to call this score. And this is, we have four characters in our alphabet. So, I'm going to make this a five by five array. So we'll have the mismatch of each character to each other character. As well as the skips, the pounding for skipping each character. So, the top left. Four by four of this array will have the mismatches in each character against each other character. So A against A is a match, so we're not going to penalize that at all. A mismatch of A to C would be a penalty of four. A mismatch of A to G would be a penalty of two. And a mismatch of A to T will be a penalty of four. So, since A and G are both purines right? >> Mm-hm. >> Since those are both purines, they're more likely to have a mismatch there than A against C or T, which are a different permadines is a different type of nucleotide, a different, slightly different structure. So we're penalizing the C and T more than G, right? And then the last character in this row will just be if there, if an A is skipped, so we're going to gives a penalty of E. So going to do this for each row now. So this is the row for C. So C against A is a penalty of four, C against itself is zero. Four, two, and then, for the skip is eight. This is for G, so it'll be two, four, zero, four eight. For T it should be 4, 2, 4, 0, 8 and then the last row would just be all skips. Just like that, did I get that right? >> Mm-hm. >> Okay, so this is our penalty matrix that we're going to use. >> Okay. >> So now, let's modify this edit distance functioning to use that. So I'm going to rename this, global alignment. The first three lines here are defining D. Just filling it out with zeros. Those don't have to change, at all. Now these next steps are where we initialize the first row and the first column. So, for example, the first row of this matrix corresponds to if we were to skip the first I characters of Y. So we have to calculate the penalties for the skips for that. So the top left character in our array is 0 and then we're just going to, for each character in the first row, we're going to take the character immediately to the left of it and add the penalty of skipping the current character in y. So I just want to take the range from one. Up to the N, since I don't want to touch the top left character. That will always be zero. And I take the character, i- 1, and I add the correct penalty from that score function, score alphabet.index of X to i minus one. So this alpha dot index will give us the index of the current character x in that so a number from one to four, that will tell us what road to look at in our score array. And then we're going to take the last value of that row, because that will give us the penalty for skipping that character. Does that make sense? >> Mm-hm. >> Okay, cool. And then we'll do the same thing for the column. So I want to go from one up to the end of y. I'll take character directly above it in the column. And now add on it, score -1. Let's take the last row of that array and then alphabet.index of y-1. >> You need to do i minus 1 instead of minus 1. >> Oh right, yes. Now, we just have to fill in the penalties here, for our horizontal distance, vertical distance and diagonal distance. So, the horizontal distance, the penalty here would be if we were to skip a character in y. So, we want to do this similar to what we just did above. Score the last row of that. I'll get the character at J-1 and Y. And so this will just add the penalty for skipping whatever is the current character in Y. And similarly for the vertical distance, we have to calculate the penalty for skipping the character in x. Now for the diagonal distance, this penalty will be the score for a mismatch between two characters. So, we'll just calculate the mismatch between the current characters in and x and y. Just like that, and then the rest of our function won't change. So we're still going to calculate the minimum of these edit distances for the current value in d. And then at the end of our function we just return the value in the bottom right corner of this already. >> So we only had to change one, two, three, four, five lines in the whole function. >> Yeah. >> It's pretty similar. >> Okay, so now let's test this out with a few strings. So I'm going to just make up a random string X. And I'll set Y initially to just be the same thing And the other distance between two identical strings as we would expect is zero. If we have a skip in here, we skip a character, now we have penalty of eight which is what we defined in our matrix If instead of a skip we have a replacement, say we replace G with a C. Now our penalty is four. >> That's the transversion penalty. >> Right. If we replace G with A, which is another purine now we only have a penalty of two. And if we have maybe a mismatch there and then a skip down here, now we have a penalty of 10 which is 2 plus 8 for the mismatch and the skip. So that's our global alignment function.