In this practical we'll be implementing edit distance using dynamic programming. So I've already pasted here the recursive edit distance function which was covered in lecture. However, this function is pretty slow so we're going to rewrite it using dynamic programming. So I'm going to create a new function at a distance which will take two strings, x and y. We're going to compute the edit distance between x and y. So the basic idea of dynamic programming is we're going to create a matrix that we're going to fill in with the edit distances for the substrings that we already computed to save time, so we don't have to recompute them multiple times. So I'm going to create a matrix D, and I'm going to initialize this as an x by y array of zeroes. Or, sorry, an x + 1 by y + 1 array of zeroes, because we want to start from the empty string. So, this will initialize D as an x + 1 by y + 1 array of zeroes. And now, we want the first column of that array to be zero, one, two, three, four, and so on. And we want the first row similarly to be zero, one, two, three, four and so on. So I'm going to say for i in range up to length of x. Should be x + 1. >> These are initialized this way because any prefix of say length x and an empty prefix. The added distance between those two is just going to be x. So that's why we initialize the first row on the first column in that way. >> Right. So now we want to step through and fill in the rest of our matrix. So I'm going to go row by row. So for each row n for each column. Now we need to compute the value that will go at this location in D. So we can get to this location either coming from the cell directly to the left, which would correspond to a skip in y. The cell directly above which would correspond to skipped character in x, or the character above and to the left. So, if we move horizontally from the cell to the left, this would be, the edit distance of this cell would be value of the cell right to its left. So i j + 1, + 1 since the penalty for a skip character is one. And similarly, we can calculate the vertical distance that would be coming from the cell above it. This would be the edit distance of the cell above it, + 1. For that skipped character. And now, if we come diagonally, the penalty for the edit distance depends on whether the characters match. If the two characters right here match, then the edit distance won't increase. So I can say if x(i- 1) equals y(j- 1), then the diagonal at a distance is just going to be the same as the value above and to the left. But if these characters don't match, then we have to add one for the penalty for that. So now that we have three possible edit distances for this cell. We want to minimize this, so we're just going to take the minimum value for this cell. So I'm going to say min of the horizontal, vertical, and diagonal at a distance. So we'll finish this loop. This will populate the entire matrix with edit distances, and then we want to get the value in the very bottom right, get the edit distance for the entire strings against each other. So I can just do that. That will return the very bottom right value of that array. And this is our edit distance function. So let's test this out. I'm going to start by running the edit distance recursive function. And we'll see how long that takes and then we'll compare the time against our new dynamic programming version. So this line will just tell iPython to print out the time that it takes to run this. And say x equals. Just give it two strings that are somewhat similar here And then I'm going to Calculate the recursive edit distance for x and y. And this is running. You can see it's already taking a bit longer than we would expect. These strings are only about ten characters long. Takes about five seconds, and the edit distance is three. You can see we have a upper and lower case character here. We have a s matched up against the space here, or, that would be an inserted space here, and then an inserted r here, for an edit distance of three. So now let's do the same thing, but use our dynamic programming version. When I run this, it computes in about 200 microseconds, I think, so that's really fast. >> That's better. >> Edit distance is the same so it works, so that's the dynamic programming version of edit distance.