Hi. In this video, we are going to study the initialization part of the genome construction strategy. In it, we're going to first sort the characters of the initial string and consider them the partial cyclic shifts of the string of length 1. Then we will also introduce the notion of equivalence classes among partial cyclic shifts of the same length, and we will compute those equivalence classes for the partial cyclic shifts of length 1. Let's start with sorting the single characters. We know that there is an alphabet and it has only size of the alphabet different characters and we can use actually counting sort which you could have learned in one of our previous courses to compute order of characters. Here is the procedure SortCharacters which basically implements counting sort applied to characters of string S. It initializes order which will store the result with array of size, length of the string, and also initializes an auxiliary array count which is used for the counting sort and it is filled with zeros. Then we compute the number of times each character occurs in the string, we consider that each character has index from zero to size of this alphabet minus one. Then also store in the same array, instead of counts of characters, number of characters up to jth character which occur in the string S. Then goes the last loop of the counting sort where we go from the biggest character to the smallest and we save the positions in which these character occur and in the end, we get the order of the characters according to their sorted order but will also store the character at the corresponding position in the string S as corresponding position in the sorted order, and then we return this array with the order of the characters. This is just the counting sort. Now, lemma states that the running time of this procedure is proportional to sum of the length of the string S and size of the alphabet and that is based on the fact that counting sort for N objects which can take M different venues works in time proportional to N plus M. Now let's introduce the notion of equivalence classes for partial cyclic shifts of the same length. We denote by C_i partial cyclic shift of length L starting in position i. For some i and j, C_i can be equal to C_j. Those strings which are partial cyclic shifts of length L starting in position i and j can be equal and then we say that they are in the same equivalence class. To quickly determine whether two partial cyclic shifts are in the same equivalence class or not, we're going to compute their class number. Class of i is by definition the number of different partial cyclic shifts of length L that are strictly smaller than the partial cyclic shift, C_i, starting in position i. I stated that two partial cyclic shifts are equal if and only if their class numbers are equal. If two partial cyclic shifts are equal, then of course number of different cyclic shifts of length L that are strictly smaller than each of them are equal. On the other hand, if the class numbers are equal, then the cyclic shifts also have to be equal because otherwise if they're not equal, then one of them would be smaller than another. For example, C_i is smaller than C_j. Then for the class number of C_j class of j, not only all the partial cyclic shifts that are less than C_i would be included in this number but also the C_i itself would be included in this number. The class number for C_j would be bigger than the class number for C_i. If the class numbers are indeed equal, then the partial cyclic shifts are also equal. These class numbers would allow us to determine in constant time whether two partial cyclic shifts are equal or not. The only problem is how to actually compute those class numbers. Let's look at an example first. Our example string is ababaa$ and we've already sorted the single characters of the string and we get the order of them, 6, 0, 2, 4, 5, 1, 3. This is the order of all the characters and the order contains the positions of these characters and the positions are ordered in the increasing lexicographic order of the characters themselves. Now what we want to build is the array class. An array class for each position in the string should have the number of partial cyclic shifts of length 1 which are strictly less than the partial cyclic shift starting in this particular position. We're going to start from the smallest character, from the smallest partial cyclic shift. It is the dollar, and it is at position 6. We need to write down something at position 6 of the array class and that number that we need to write is 0 because this is the smallest partial cyclic shift. There are no other partial cyclic shifts which are even smaller than it. We initialize position number 6 in the array class with 0. Now we move to the next partial cyclic shift in the lexicographic order. If it is different from the previous in the lexicographic order, as in this case, it means that the previous one is also smaller than the current one, and so the class number is bigger than the previous class number by one. Now we're looking at partial cyclic shift a at position 0. We need to fill in position 0 in the array class, and we need to fill it in with number 1 because it must be bigger by one than the previous number because the new partial cyclic shift the next one in the order is bigger than the previous one. Now we'll look at the next partial cyclic shift. It is again a, so it is the same as the previous partial cyclic shift and it means that the class number will be the same. What is the position? The position of the partial cyclic shift we're looking at is 2 so we need to fill in the position 2 in the array class and we need to fill it in with the same number 1. The next partial cyclic shift is also equal to the previous one so we need to fill its position 4 with number 1 and we fill position 4 in the array class with number 1. The next partial cyclic shift is identical to the previous one, so we need to fill its position 5 with number 1. We do that. Now the next partial cyclic shift is b, is bigger than the previous partial cyclic shift so we need to add one to the last class number we get and fill it in the position 1 of this partial cyclic shift, position 1 with number 2. Then the next partial cyclic shift is equal to the previous one, so we fill the position 3 with number 2. This is how it works in this example. Now let's implement it in the general case. This procedure ComputeCharClasses takes as input both the string S and the precomputed order of all the single characters in this string. We initialized the array class as the array of size length of the string S and for the smallest partial cyclic shift for which the position is stored in the 0th element of the array order, we set the class number to 0 because this is the smallest partial cyclic shift. There are no other partial cyclic shifts which are smaller than this one. Then we go for all the other partial cyclic shifts in the increasing order from number 1 to number length of S minus 1. If the current partial cyclic shift, which is just one character, if it is not equal to the previous character in the order, it means that it is bigger and that the class number is bigger by one than the previous class number and this is what we do in these two lines of the code. Otherwise, if the new character is the same as the previous character in their order, then their class numbers are the same and we just assign the previous class number, and in the end, we just return the array with classes of the characters. The lemma states that the running time of this procedure, ComputeCharClasses is linear and basically it is clear because there is just one for loop with length of S iterations and everything inside this loop is constant time. In the next video, we're going to discuss how to sort doubled cyclic shifts given that we already sorted the initial partial cyclic shifts and we also know their class numbers.