0:00

Hi, in this video, you will learn how to compute LCP array in a linear time.

And the main idea is the following.

We'll start by computing LCP of the first two smallest suffixes directly by

comparing them character by character.

But then, on each next iteration,

instead of going to next pair of suffixes in the suffix array,

we move the smaller suffix in the stream one position to the right and

then compute its LCP with the next suffix in the suffix array.

So we won't go in good order in the suffix array, we will go in some strange order.

But this order is good.

It will show that if we go in this order through the smaller suffixes,

then the LCP of the smaller suffix and the next suffix will decrease by,

at most, one on each duration.

And so, we will know that most of the characters of the two new suffixes,

we have already compared many of them, and we don't need to compare them again.

We'll start from there, and we will convert the next character and

the next one directly, and the LCP itself will be very easy to compute,

because we will still do that by direct comparison of characters with characters.

We will just avoid some of the comparisons,

because we will know from the previous durations that the common prefix has

at least such length, and we don't need to compare the first such many characters.

And in the end, it turns out this will work in linear time.

So we will denote by A and of Pi the suffix, starting

in the next position in the stream, after suffix Ai, in the suffix array.

So the next one in the suffix array will be Ai plus one, but we won't know that

one, but the one which starts in the string one position to the right.

1:54

So here's an example.

We have a string, which is ababdabc, and

the smallest suffix is ababdabc.

The whole string is actually the smallest suffix, and

the next one in the suffix array is abc.

And their longest common prefix is ab, and here we see it.

And we compute this longest common prefix, which is equal to two, by the length,

directly.

2:19

And then, we will know that if we move to the next

two suffixes in the stream to the next one after a zero and

the next one after a one, those will both start with letter B.

So the last of the common prefix decreased by, at most,

one, cuzboth suffixes just moved one position to the right, and

we cut away only one position of the longest common prefix.

Of course, these two suffixes are not, probably,

two neighboring suffixes in the suffix array in the general situation.

It might be so that there is some suffix between them.

But because of the property of the LCP array, the longest common prefix

of the smaller suffix with the next suffix in the suffix array will be even bigger or

at least the same as its longest common prefix with the next suffix in the string,

because the next suffix in the string is bigger than the smaller one.

And by the property of LCP array,

the common prefix with the next element is the same or

bigger than the common prefix with some element farther away in the suffix array.

So now, we can move to the next element from the smaller one and

then take the next one to it in the suffix array, and compute their

L speed directly but remembering that the first several characters,

we don't need to compare exactly those, which are in the LCP of the previous pair.

So, this is basically the algorithm.

We compute LCP(A[0] and A[1]) directly and save it's value as variable LCP.

And then on each iteration, first suffix in the pair, which is smaller,

goes to the next in the string, then we find which one is the next

in the suffix array in the order, and we compute their longest common prefix

knowing that we don't need to compare the first lcp- 1 characters.

4:26

And then on each comparison, if it's successful, we increase LCP, and we go

to the next comparison, and we repeat that until we feel the whole LCP array.

And the idea is when we make each comparison, we increase LCP.

And when we move to the next pair, we decrease LCP by at most one.

And this is why we cannot do too many iterations.

So the Lemma states that this algorithm computes LCP array in linear time.

5:02

It either finishes the iteration, and

number of such comparisons is at most number of iterations, and we have at most

length of the string iterations, or if it is a successful comparison,

then it increases the current value of variable LCP.

5:18

And the variable lcp cannot be bigger than the length of the string in any moment,

and at each duration, lcp decreases by at most one.

So if we start from zero, we cannot go higher than length of S,

and at each iteration, we decrease by at most one.

We cannot do more than linear time of increasing LCP,

and so we cannot do more than linear number of comparisons.

And this is why this algorithm works in linear time.

And as soon as we can now compute the LCP array, we can proceed in the next video

to construct the suffix three given the suffix array and the lcp array.