The previous lecture ended with a rather difficult algorithmic challenge that we will try to solve using the Burrows-Wheeler Transform and suffix array. Let's start with the Burrows-Wheeler Transform. Allow me to slightly change the focus. Instead of pattern mention, we'll talk about text compression. And run-length encoding is the simplest way to compress text. Where a run of a single symbol is substituted by the number of times the symbol appeared in this run, followed by the symbol itself. You may be wondering why we want to do run-length encoding for genomes because genomes don't have many runs. But they do have many repeats. For example, more than half of human genome is formed by repetitive DNA and the lion's share of many plant genomes is formed also by various types of repeats. But here's an idea. Let's convert test into something else so that our repeats will be converted into runs. We'll start from the genome then we'll turn it into ConvertedGenome and then we will apply run-length encoding to the ConvertedGenome. Because our hope is that ConvertedGenome will have many runs. Let me show how we can accomplish this. So let's consider all cyclic rotations of our favorite string, panamabananas$. We start from this one, and this one, and this one, and continue to form all cyclic rotations of this string. And then after you generated all the cyclic rotations, let's sort them. The dollar sign is viewed as the first letter of alphabet, even before A, so we'll start with $panamabananas. Continue, continue, continue, continue, and finally we'll have a sorted list of all suffixes of the text. You might be wondering why we are doing this, but look at this strange sync. If we look at the last column of the resulting array and the last column of the resulting array is called Burrows-Wheeler transform of the text. That you will notice that also our regional string, panamabananas$ did not have many runs. The Burrows-Wheeler transform of this string actually has many runs. For example here's a run of five A in the Burrows-Wheeler Transform of our original text. How have we achieved it? Let me explain it by using an example from the famous Double Helix Paper by Watson and Crick where they first presented the structure of DNA. And these are just some consecutive strings in Burrows-Wheeler transform of this book. And you see that there are many runs of A in the Burrows-Wheeler transform of this text. Why so many runs of A? Well one of the most common words in English is and. And every time you have and in the text, it is likely to contribute to a run of A in the Burrows-Wheeler transport, as you see in this example. So our goal now is to start from the genome, apply Burrows–Wheeler transform to the genome. And we can now, hopefully, comprise Burrows–Wheeler transform of the genome. And after you apply this compression, we will greatly reduce memory for storing our genome. But it totally makes sense if we can invert this transformation. And from compression version of Burrows-Wheeler transform, we can easily go back to the original Burrows-Wheeler transform. But can we go back from the Burrows-Wheeler transform of the genome, to the genome itself, is it even possible?