0:00

Hi, in this video you will learn how to implement the transition phase of

the suffix array construction algorithm.

In the transition phase,

you assume that you have already sorted cyclic shifts of some length, L.

And you know not only their order but also their equivalence classes,

and you need to sort based on that cyclic shifts of length 2L.

The main idea is the following.

Let's denote by Ci cyclic shift of length L starting in position i,

and by Ci prime the doubled cyclic shift starting in i.

That is, cyclic shift of length 2L, starting in position i.

Then, Ci prime is equal to Ci concatenated with Ci + L.

So we just take string Ci, we take string Ci + L, put it after string Ci.

And the total string of length 2L is equal to string Ci prime.

And so to compare Ci prime with Cj prime it's sufficient

to separately compare Ci with Cj, and Ci + L with Cj + L.

And we already know the order of the cyclic shifts of length L.

So instead of comparing them directly, we can just look in the array

of their order and determine which one is before which one.

And that one is going to be smaller or the same as the other one.

And also, we have the area with equivalence classes.

And so we can determine whether two cyclic shifts of length L are really equal,

or they're different, by looking in the array for equivalence classes and

comparing their equivalence classes.

So basically we can compare two cyclic shifts of length L in constant time.

And that is why we can sort the doubled cyclic shifts faster.

1:58

For example, if S is our initial string, ababaa$, and the current length L is 2.

And position i is also 2, then Ci is C2,

is a cyclic shift starting in position 2, which is ab.

Ci + L is C2 + 2 which is C4, which is aa.

2:20

And Ci prime is equal to abaa.

So this is the distinction between cyclic shifts of length L and

2L, and how we combine C2 and C4 to get C Prime 2.

So now we have to think about the following problem.

We need to sort pairs of numbers basically,

because each cyclic shift of length L corresponds to its number

of position in the order of all cyclic shifts of length L.

And we first need to sort by second element of pair,

and then we stable sort by the first element of pair.

And if we do these two steps, then our pairs will be sorted

because they will be sorted by first element.

And also inside the equal first elements it will be sorted by the second element,

because it was initially sorted by the second element and the sort is stable.

So we didn't break the order of the second element in the case when the first

elements are the same.

So, this is the idea for sorting pairs of objects.

3:49

Now for each of the cyclic shifts of length 2, let's look

at the cyclic shift of length 4 which ends in this cyclic shift of length 2.

So we take the two previous characters and add them to the left.

So C4 prime and in C6, and also we'll look at C5.

We take the two previous characters, and C3 prime ends in C5, and so on.

So we go by two characters to the left from

each of the cyclic shifts of length 2, and

we get a set of cyclic shifts of length 4.

Now we have highlighted in yellow the first elements of the pairs,

which are also cyclic shifts of length 2.

Those are not sorted, but we know their starting positions and

we know what are the correct starting positions in the sorted order.

So we can reorder this list of cyclic shifts of length 4 by the order

of the first halves of the elements in this list using the known order.

And we will need to do so in a stable sort fashion so that if,

for example, C2 prime is before C0 prime, and

the first half of C2 prime and C0 prime are the same.

They need to stay in the same order in the final sort.

5:23

And the same goes about C3 prime and C1 prime.

They both start in ba.

So when we sort by the first half, C3 prime has to stay before C1 prime.

That's our requirement.

So suppose we manage to sort the first halves in such a way, and

we started the whole cyclic shifts of length 4 accordingly, what do we get?

We actually get the sorted list of cyclic shifts of length 4.

And of course for those which differ in the first half,

it's obvious that they compare in the correct order.

But for those which are the same in the first half,

their second half is also sorted because it was sorted initially

in the second column and we implemented a stable sort.

So C2 prime is still before C0 prime, and C3 prime is still before C1 prime.

So this is the idea.

6:25

For sorting double cyclic shifts we take Ci prime, which is a double cyclic

shift starting in position i, and we know that there's a pair of Ci and Ci + L.

And already know that the single cyclic shifts are already sorted.

C order[0] is the smallest one, and C order |S|- 1 is the biggest one.

Now let's take the doubled cyclic shifts starting exactly L to the left,

counter clockwise from those.

And then C prime order of [0]- L, C prime order of [1]- L, and so on

are sorted by second element of the pair of single cyclic shifts, sorted already.

And when I decrease order of 0 by L, I mean decrease

modulo the length of the string because this is a cyclic string.

So we need to do a cyclic subtraction.

So we get these C prime order [0]- L and so on sorted by the second

element of a pair, and we need only a stable sort by first elements of pairs.

But we know that counting sort is stable, and

we know equivalence classes of single shifts for the counting sort.

And we know that there are not many different single shifts.

At most their number of different single shifts is length of string.

So we can again use counting sort to sort those