So in this section, I'm going to describe about string compression, and Burrows-Wheeler transform. In the previous section, Phillip gave a great description of the suffix tree. It allows us to find a pattern very quickly. At the same time, the memory and the run time required for constructing the suffix tree for the genome is linear, O( |Genome| ). If some of you believe Phillip and try to construct a suffix tree for many genomes on your desktop or laptop. You will have problem with memory. Why? Because the big-O notation here ignores an important constant. The best known suffix tree implementation requires about 20 times the length of the genome. Can we reduce this constant factor into 1? Or even smaller? I now switch the topic. I want to talk about genome compression. It seems to you that Son just changed the topic very quickly. But I will promise that the thing that I'm going to describe, that is how to compress a genome will have a relation to the problem that we are discussing today. That is, multiple pattern matching. But allow me to describe the genome compression first. Give you this string, is there a way to present it shorter, or is there a way to compress it? Well, you can see that you have a lot of runs of G and C and A and T and so on. So instead of writing the genome in this form, you can write instead of writing "GGGGGGGGGG", you have just write "10G" Instead of writing "C" 11 times, you can just write "11C". So, you see that if the genome has a lot of runs like this, it would be easy to compress, but the problem is that the genomes don't have lots of runs. But they do have lots of repeats. Can we somehow transform the genome with lots of repeat, into another string, genome star with a lot of runs. And then run the run-length encoding to compress genome. How we deal with this step? Let me describe the Burrows-Wheeler Transform. I have a string, "panamabananas" terminated with a special tag "$". Let's first form all cyclic rotations of "panamabananas". We start with "p". The next one is dollar sign. Next one start with "s". And, so on so forth. We start a next rotation. So there are N possible rotations, right? Yes. What should we do next? Now, I sorted all these strings in lexicographic order. You see the color, here, just to show you that these strings are sorted. What is the Burrows-Wheeler transform of these strings? If you take the last column of this text, it is called the BWT. But why would we care about the last column? Well, let's look at an example. This is actually the text from Crick and Watson for the double helix structure of the DNA. And this is the one region of the BWT. So you see, the last column is BWT. It has a lot of runs. Why? Because the text here has a lot of repetitive words. In this case it is the word "and". You may ask me another question. Why don't I chose the first column as the BWT? Well it also contains a lot of runs, right? Yes. I believe Phillip will give you a better explanation.