Hi. In the previous video we proved that our rule for shifting the pattern on all the text is safe. That is, we won't miss any of the occurrences of the pattern in the text by using that rule. However, the rule involves finding the longest border of some string. We need to learn to do that efficiently and prefix function is a tool that will help us doing that. This is actually the central tool of the Knuth-Morris-Pratt algorithm. In this video, we will study the properties of prefix function. By definition, prefix function of a string P is a function that returns for each position i in the string, the length of the longest border of the prefix of the string ending in position i. Let's look at an example. String P is a, b, a, b, a, b, c, a, a, b and below it are the values of the prefix function. We will start with the shortest prefix, which consists of just one character. It corresponds to the prefix P from zero to zero. The longest border for this string which consists of just one character, it is just an empty string. Because it has to be a substring of the string and it has to be not the same as the whole string. There is only one such substring which is empty string. The length of the empty string is zero and thus the value of the prefix function for the prefix a is zero. For the prefix a, b, the prefix function is again zero because the longest border must be not just the substring of a, b, it also must be such string that is both a prefix and the suffix of a, b. But there is no such non-empty string because a is not equal to b. Again, the longest border is an empty string and its length is zero. Let's look at the prefix a, b, a. This prefix already has a border. It is character a, which is both a prefix and a suffix and it is actually the longest border. The prefix function is one. For the string a, b, a, b. It has border a, b, which is both a prefix of length 2 and the suffix of length 2. This prefix function is equal to 2. For the string a, b, a, b, a, the prefix function is equal to 3 because there's already a border of length 3, which is a, b, a. The first three characters are a, b, a and the last three characters are a, b, a. Although they intersect, this is still a valid border. For the string a, b, a, b, a, b, we have a prefix a, b, a, b, which is also a suffix a, b, a, b and although they intersect, this is still a valid border, so the prefix function is equal to 4. We'll look at the string a, b, a, b, a, b, c and we see that there is no prefix that is equal to suffix because there is letter c, which only occurs as the last letter of the string. For the string a, b, a, b, a, b, c, a, there is a border of length 1. For the next string also the border of length 1 and for the last prefix, which is the whole string P the prefix function is equal to 2 because there is a prefix a, b which is equal to suffix a, b and there is no longer border. We will prove several properties of the prefix function now. First, we will prove that prefix ending in position i always has a border of length prefix function for a position i plus 1 for the next position minus 1. Why is that? Well, first, we consider the longest border w of the prefix starting in position 0 and ending in position i plus 1, as on the slide. Now, we'll cut the last character of this border, both from the prefix and from the suffix. We'll get some substring w prime. Notice that this substring w prime is both a prefix and a suffix of the P from zero to position i from this prefix of string P, so it is a border. We actually have just proved that the prefix ending in position i always has a border of length prefix function for the next position minus 1. Because we can just take this longest prefix for i plus 1 and cut out the last character and get the border for the previous prefix of P. A corollary from that is that the longest border for position i plus 1 is not longer than the longest border for the previous position plus one, because there is always border which is just one character shorter for the previous position. For the next position, the longest border can be longer by more than one. Another important lemma is that, if prefix function is not zero for some position i, then all borders of the prefix ending in position i but for the longest one are also borders of the longest border of this prefix, which is the prefix from zero to position s of i minus 1. Let's look at the picture. We have position i and we have the longest border of length s of i and I will state that all the borders of this big prefix from zero to i but for this longest green one, all the other borders of this prefix are also borders of the string which is the first green prefix. To prove that, let's look at any border of the prefix ending in position i which is shorter than s of i. This is a prefix of this string, and it is also a suffix ending in position i. As the green strings are equal, the suffix ending in position i is also suffix of the first green part. We see that this string u is both a prefix and the suffix of the first green prefix of length of s of i, which means that it is indeed a border of this string starting in position zero and ending in position s of i minus 1, which is the first green part. We proved they're alike. An important corollary from the previous statement is that all the borders of the prefix of P ending in position i can be enumerated by first taking the longest border of this prefix, and then taking the longest border of this border, and then taking longest border of this boarder and so on. Why is that? Because first and longest border of the prefix ending in position i is the border b_1. We just proved that all the other borders of this prefix ending in position i are also borders of this b_1. To take the longest out of all the others, we take just the longest border of b_1, which is b_2, and now, all the borders of b_1 are also borders of b_2, but for the longest one. To take the longest out of the ones we didn't enumerate, we just take the longest border of b_2, which is b_3, and so on. This is a way to enumerate all the borders of the prefix ending in position i in the order of decreasing length. Take the longest border, take the longest border of the result, and so on. We can do all of that by using prefix function because prefix function, at position i, says the length of the longest border of the prefix ending in position i. Then prefix function at the ending position of this border says the length of the longest border of b_1, which is b_2. Then the prefix function at the last position of b_2 says the length of b_3 and so on. We can use prefix function to compute all the borders of the prefix ending in position i, assuming that we already know all the values of the prefix function for all positions up to i. Now, how to actually compute prefix function. We're going to compute prefix function in position i plus 1, assuming that we know all the values of the prefix function for positions from zero to i. it is easy to compute prefix function for a position zero, it is always equal to zero, and if we can compute si plus 1 based on all the previous values, then we can compute all the values of the prefix function, like in mathematical induction, one by one. What happens when we add one more position? Let's first consider the longest border of the prefix ending in position i, and then consider the next character, which is i plus 1, and the next character after the prefix of length s of i. If these characters are the same, then we can just add 1 to the prefix function of position i, and we will get si plus 1. Why is that? Well, first we already know property that si plus 1 is less than or equal to s of i plus 1. Also in this picture we see that if this corresponding characters are equal, then actually we have a border of the prefix ending in position i plus 1 of length s of i plus 1. In this case, s of i plus 1 is at least s of i plus 1, and it means that is both less than or equal and greater than or equal to s of i plus 1, so this is an equality. Another case is when the next character at position i plus 1 is actually different from the next character after the border of length s of i. What should we do in that case? These are different and we want to find such position where we will have character x after some string which is equal to the suffix of the corresponding prefix ending in position i. This yellow string then will be both prefix and suffix of the string from zero to position i. Then it will be border of the longest border of our prefix ending in position i, so s of i plus 1 in this case will be equal to the length of some border of the longest border of the prefix from zero to i plus 1 because it will make load to the length of the yellow part, which is some border plus 1. Basically, this is the main idea for computing si plus 1, it is either equal to the prefix function of the previous position plus 1, or it's equal to some other border of the prefix ending in position i plus 1, and we know how to enumerate all the borders of the prefix ending in position i using the values of the prefix function from zero to i. From all this knowledge, we can actually build an algorithm to compute all the values of prefix function efficiently, and we will do that in the next video.