Hi, in this video you'll learn the algorithms used in the initialization phase of the suffix array construction. And those are sorting of the single characters of the initial string and also competing equivalence classes of those characters. First we know that we assume the alphabet is finite. And we can use thus counting sort to compute order of the characters. You probably remember a counting sort from the first course, algorithm tool box. If you don't, I encourage you to go through those lectures once again, because we will use counting sort twice in the construction of the suffix array. In the initialization phase and in the transfer phase. Here is the pseudo codes for the procedure SortCharacters, which takes string as an input and returns the order of the characters of that string as the output. We start with initializing that order as an array, of size equal to the length of the string. And we'll also need another array count, used in the counting sort. Which will initialize with zeroes, and which has size equal to the size of the alphabet, not the size of the string, but the size of the alphabet. And then what you have in the following two for loops is just the familiar code for the recomputation phase of the counting sort. We count the number of occurrences of each of the characters in the string and then we also compute the partial sums of that array. And then in the end, we go from the right to left in our string S. We look at the character and we know that the partial sums array contains the position after the position where this character should be in the order. So we decrease the counter by one and we save our character position in the corresponding cell of the array order and in the end we just return the order of the character. So this is just an implementation of the counting sort as applied to characters of string has. And it works in time proportional to length of the stream plus size of the alphabet. because we know that this is the running time of the counting sort for length of S items, each of which can take only size of the alphabet, different values. And I need to note here that typically the size of the alphabet is small like for example, four letters, four streams in a genome, or 26 characters. If we are only working with the English words, or maybe alphanumeric characters. Then there will be 26 small letters, 26 big letters, and 10 digits. But sometimes the alphabet can be very very big, such as Unicode. And in this case counting sort might not be appropriate. If your string for example has only 1000 characters but those are all unique code, and the alphabet size is a few million character, then maybe you could sort the characters of this string in a more efficient way. Apart from sorting the characters, we will also need additional information to make the following steps of the algorithm more efficient. And to do that, we introduce equivalence classes of the partial cyclic shift. So we denote by c with index i, partial cyclic shift of length L, where L is the current length of the cyclic shifts, which we already have sorted. And initially, we have sorted single characters. So L is equal to 1. And then on the further phases of the algorithm, L will increase from one to two, to four, and so on, twice in each iteration. So, some of the cyclic shifts can be equal to different cyclic shifts starting in different positions. Ci can be equal to Cj and then they should be in the same equivalence class. So to assign equivalence classes, we define the area class. And class of i is equal to the number of different cyclic shifts of length L that are strictly smaller that the cyclic shift starting at position i. So for different cyclic shifts which are equal, the value of class[i] and class[j] will be the same. Because the same other cyclic shifts are smaller than these two equal cyclic shifts. And we'll need to compute this array class to increase the speed of the next phase. And before computing this array class, we assume that we have already sorted all the cyclic shifts of the current length L. So, how to actually compute the classes of the cyclic shifts when we already know their order. Let's look at the example of sorted characters of the string. So we know already that the characters are sorted, and their order is 6, 0, 2, 5, 1, and 3. Now let's assign classes. We want to assign class 0 to the smallest of the cyclic shifts of the current length. Which is dollar, which is in position six. So, we write 0 in position six of the class. And we initially set up a class to be of length equal to the length of the string of course. The next, smallest cyclic shift is letter a and it is different from the previous smallest one which is dollar. So we need a new equivalence class for a. And so, we assign 1 to the equivalent class of a which is in position 0 in the initial string. So we assigned 1 to class of 0. And the next one is also a which is already in position two. But it is equal to the previous one. So we are saying the same equivalence class to it. So we'll write down 1 as the value of class of 2. The next one is also a. It is also equal to the previous one. So we assign 1 to class of 4. And the same one we do with class of 5. And the next one is b which is different again from the previous one and so we assign new class which is bigger by one which is two so we assign 2 to the value. Value of class of one because b we find in position 1. So class of 1, we assign to value 2. And then the last one is also b it is equal to the previous one so again we assign 2 to class of 3. And now we know the classes of all the single character cyclic shifts. We know that the smallest one is dollar. And it is the only one that's equals 0. We know that 4 a's are in the equivalence class 1. And we know that 2 b's are in the equivalentce class 2. Here is the pseudo code for the algorithm ComputeCharClasses which takes its inputs string S, and the order of the characters. And computes the equivalence classes just for single character cyclic shifts of the string S, given their order. So we initialize the array class with just an area of size equal to the length of the string S and that will be our return value. Also initialize the first value of this class array. But we don't initialize class of 0. We initialize class of order of 0, because order of 0 is the position in which the smallest character in the string occurs. And we initialize this character with class 0. So we assign class of order of 0 to 0 saying that the character in position order of 0 has equivalent class of 0. And then starting from second character in order up to the end, we go through the characters of string in order and we assign classes. To assign a class to a new character, we compare it with the previous one in the order. If it's different from the previous one means it's bigger because we go through them in the order. And so we need a new class. And so we just take the value of the previous class, increase it by one and assign to the class of this character. Otherwise, if this character is the same as the previous one, we don't need to create a new class. We just assign the same class as the class of the previous character. And in that, we return the array with the classes. And we state that the running time of this algorithm is linear. Which is obvious, because we only have initialization of the array. And then for loop, which runs for linear number of iterations with constant number actions performed in each iteration of the for loop. And that's all for the initialization phase of the suffix array construction. And in the next video, we'll learn the transition phase from the current length to twice the length of the cyclic shifts.