0:00

Hi, in the previous video, you've learned that we need to quickly compute

longest borders of different prefixes of the pattern.

And in this lecture you will learn the notion of the prefix function,

which helps to do exactly that.

And we will study some of the properties of the prefix that allow it to compute

it fast.

Prefix function of a pattern P is such function that for

each position i in the pattern,

returns the longest border of the prefix of the pattern ending in this position i.

Let’s consider an example.

Here's a string P and we consider the first prefix of this string,

a, for this prefix there is no border.

So, prefix function is 0, now for the prefix a b,

ending in the position number 1, there is also no border,

because a is not equal to b, and so, the prefix function is again zero.

For prefix a b a, ending in position 2,

the longest border has length 1, and it is border a.

For the string ab, a b the longest border is a b of length two,

and for the next prefix the longest border is already of length 3.

And although the prefix aba, and

the suffix aba intersect, this is still a valid border.

And for the next string, the longest border is a b a b of length 4.

And then we meet character c, and the prefix function begins from zero again,

because there is no border for the string a b a b a b c.

And for the next string, the longest border is a.

For the next one, again, a, and for last one ab.

So here is the Prefix function.

Now we will prove a useful property of the prefix function that

the prefix ending in position i has a border of length s(i+1)-1.

To see that, let's first consider the longest border w of the next

prefix ending in position i+1.

W has length exactly as of i+1, by definition of the prefix function.

Let's consider the last character of w and cut it out.

What's left, denoted by w prime, is a border of the prefix ending

in position i and it's length is exactly s(i+1)-1.

So we've just proved the property, so

we see the prefix ending in i has a border of length as, i + 1- 1,

and it means that the longest border of the prefix ending in position i,

is at least s(i + 1)- 1.

From this we get an immediate corollary that the prefix function cannot grow very

fast when moving from position to the next position.

In particular it cannot increase by more than one.

As we saw in the example, it can of course decrease, or stay the same,

but it cannot increase by more than one from position to the next position.

In the algorithms to follow, we will need to efficiently go through

all the borders of a prefix pattern and this Lemma helps us with that.

It says that all the borders of some prefix of the pattern, but for

the longest one, are borders of this longest border in term.

And to see that, let's look at the longest border, and

some border u of the prefix, which is shorter than the longest border.

And then we can see that this border u is both a prefix, and

the suffix of the longest border, and it was also shorter than the longest border.

So u is indeed a border of P[0..s(i)- 1].

And the useful corollary from that is that all the borders

of P[0..i] can be enumerated by simple ways.

First we take the longest of the borders, and then we find the longest

border of that string, and then we find the longest border of that string,

and so on until we get an empty string as a border.

And by the end,

we have gone through all the borders of the initial prefix of the pattern.

And to go from any prefix to its longest border,

we just need to use the prefix function.

And then again, the Prefix Function for that prefix of P, and

then the Prefix function for that prefix of P and so on.

So if we know the Prefix Function of the pattern we can go through all

the borders of any prefix of the pattern In an efficient way.

Going only through the borders and

not encountering any other positions in the pattern.

Now lets think how to compute the prefix function.

We know that s(0) is 0 because the prefix ending in position 0 has length 1 and

has no known empty borders.

Now to compute s(i + 1) if we already know the values of the prefix function for

all the previous positions,

let's consider the longest border of the prefix ending in position i.

And let's look at the character and position i + 1 and

the character right after the prefix with length s(i).

If those characters are equal than s(i) + 1 is at least

s(i)+1 because we can just increase the length of the border.

And that means that s(i+1) is exactly equal to s(i)+1 because we've

learnt that the prefix function cannot grow by more than One from position to

the next position but

if those characters are different then everything is a bit more complex.

So we know that there is border of the prefix ending in position i that has

length exactly as i+1-1.So if we find that border then the next green

character after it will be the same as the character in position i+1.

It will be the same x.

So what we need to do is we need to go through all the boarders of

the prefix ending in position i by decreasing length.

And as soon as we find some boarder that the next character after it

is the same as the character and position i plus one,

we can compute as of i plus one as the length of that boarder plus one.

So now we basically have the algorithm for

computing all the values of the prefix function.

We start with initializing s(0) with zero, and

then we go and compute each next value of s.

If the character s(i+1) is equal To the character right after the previous border.

Then we just increase the value of the prefix function by one and go ahead.

If those characters are different, we go to the next longest border of the prefix

ending in position i using prefix function and look at the next character after it.

If it coincides with the character in position i + 1 then we found the answer.

Otherwise we again go to the next longest border using the prefix function and

look at the next character after it and so on.

At some point, we may come to the station that the longest border is empty.

And then we'll need to compare the character in position i + 1 with

the character in position 0.

And either they are the same, and then the prefix function is 1.

Or they are different, and then the prefix function has the value of 0.

Now you know a lot of useful properties of the prefix function but

we still don't know exactly how to compute all of its values and

you will learn that in the next video.