Okay, everybody, so this lecture we're going to be focusing on next generation of sequencing. And the first one of those is so-called 2nd generation sequencing. So the main innovation that allowed so -alled 2nd generation of sequencing. As you remember, before we talked about we really needed a new way of doing the sequencing chemistry, such that we didn't end up with a chain termination every time we wanted to determine a base pair. We needed some way that we didn't end up with a whole range of length of alogos, in order to determine the sequence of a particular template strand. And one of the main innovations that allowed this is so called sequencing by synthesis, rather than sequencing by termination. So what sequencing by synthesis means is that each time a base pair is incorporated into a new strand that's being copied from your template by DNA polymerase. You can put that base pair in, you can figure out what that base pair was, and then you can continue to grow the chain in real time. So, it's a fundamentally different way of sequencing, as opposed to the first generation sequencing, which terminated the entire sequencing reaction each time a base pair was determined. So, how does this actually work? It can work in actually quite a few ways if we ask ourselves, what is changing when we actually have DNA polymerase take a base pair and add it to a growing chain of copying a template. So first of all, we can look at the event where DNA polymerase actually integrates a nucleotide into the backbone. So you have a dNTP floating around. It comes in and it hybridizes in base pairs according to the base pair on the template strand. So we can monitor that incorporation. When the reaction actually happens to incorporate that nucleotide into the growing strand, there's a couple of products released as well. There's a hydrogen ion that's released, and there's also pyrophosphate that's released. So the nucleotide triphosphate gets hydrolyzed to produce the phosphate link into the backbone, but it releases these two phosphates off the end. So we can detect any number of those in order to do this so called sequencing by synthesis reaction. So there's actually a whole range of commercialized second or next generation sequencing technologies, that utilize all of these different methods. So I'll go into a little bit more detail on them in the next few slides, but the thing to remember when comparing all these different technologies is, what is used most commonly today? And that would be the Illumina platform, which we're going to be going into in more depth in this week. In this platform you can get 30 fold coverage of the human genome. Meaning that on average you're reading every base pair at 30 times, but they're randomly spread out across the genome. So you're not actually reading each base pair 30 times. Some of them are read much more than 30 times, some of them much less, but the average is 30 fold. And that costs about 5 to 10,000, and actually this number may be even lower now that Illumina is just recently sending out some of their newer generation machines. But we'll see how that goes. The first sequencing by synthesis is the so called 454 sequencing by Roche. And this was the first next generation sequencer in the market. The way that this works is that similar to the previous lecture I was talking about these very tiny and microfabricated wells or these kind of nano or micro wells. And they were able to design a system where you had a DNA polymerase that would be copying the DNA template, which would attach to small beads. And close to these beads then, you could couple the release of pyrophosphate. These two phosphates that get cleaved when the base pair gets incorporated into the growing strand. And this substrate then could be utilized in order to produce a pulse of light using luciferase. So every time a base pair was incorporated, you would get a pulse of light. And then by cycling the different base pairs going through this machine at any given time, you could know when was the light generated in each well, given that this base pair was being cycled through. Another type of next generation technology, second generation sequencing technology is the IonTorrent machine from Life Technologies, now Thermal Fisher. So again, it uses this kind of nano micro fabrication technologies. And essentially they've built a very sensitive massively parallel pH meter, which detects the hydronium ion, the hydrogen ion. Which is released every time a nucleotide is incorporated into the DNA. So similar to the Roche System, you have a, some kind of signal that is spatially resolved in each well. And by changing the nucleotide that you have in the solution available to the plumb raise at any given time, you can tell in which well which base was incorporated. Okay, so the Illumina or other methods that depend on detecting the incorporation of the nucleotide itself, really depends on several novel aspects of chemistry in this base pair. In the base pair itself and in the DNA polymerase reaction. So it consists of several steps. The first step is to introduce a particular nucleotide which is labeled with a flora for that's unique for that nucleotide. So you have four unique fluorescent labels, one for each nucleotide. So, you put one labeled nucleotide into the machine. You see which ones hybridize to your particular strand, okay? So, it will hybridize to the leading strand based on base pairing, and then you can measure what fluorescent signal is there at the particular spot where your DNA strand is growing. Then through a series of unique chemical reactions of which there's several different variants, we'll talk about what the Illumina system is. You can then cleave off the floor four. So the flourescent signal goes away. And then you can do what's so-called removing a block. So these nucleotides that are introduced into the machine are so-called blocked, meaning that If they have some functional group on them which does not allow them to DNA polymerase to put the next nucleotide on. So this is indicated in this picture here, the third picture, reset, by removal of this x or removal of the block. And once you do that then, you can DNA polymerase is able to incorporate the next nucleotide, so you can again then grow the chain, detect the fluorescence, reset, and go on, and on, and on. Okay, so a little bit more about this chemistry. So, in the case of illumina, you have your nucleotide base and it's got a blocking residue on the three prime end, which prevents It from DNA polymerase from incorporating the next nucleotide. And it also has a fluorophore connected to a cleavage site. So that way when DNA polymerase incorporates this special kind of base pair into the growing strand, it's there for a fixed amount of time for as long as we leave the block on this nucleotide. If we never removed the block, then the DNA polymerase could never continue the incorporation of more base pairs into the growing strand. And the floor floras there which allows us to detect which particular base pair is there. Then we can cleave the floor four, and we can unblock the three prime end using specific chemical reagents, and then this allows the sequencing reaction to commence. Okay, so those are the major ways in which second-generation sequencing is done. Just to reiterate, the main aspects of second-generation sequencing are, one, the ability to do sequencing In a massively parallel format. So, just like synchrosequencing was scaled up to 96 and 384, well, or even higher spatial resolution of sequencing, the same is done with second-generation sequencing, sequencing millions and millions of strands all at once. And the second is a chemistry-based innovation where instead of doing sequencing by chain termination, as was first generation of Senger sequencing, the ability to do sequencing by synthesis was developed by looking at one of several different things which had been when DNA polymerase incorporates a base pair. Either the incorporation of the base pair itself by a fluorescence tag on the base pair, or detection of the hydrogen ion or the pyrophosphate, which is released when DNA polymerase actually attaches the base pair to the growing strand. So, even though this is a major advance and really decreased the cost of sequencing a human genome, there still are some limitations to second generation sequencing. And the first is that you have to amplify your sample to get a sufficient number of sequences to meet detection thresholds in most situations. This can bias your coverage. It can bias your coverage of GC rich sequences, because GC rich sequences behave differently in PCR protocols. And this kind of amplification bias, where certain strands are amplified more or better than others, is inherently a problem for quantitation. It's going to distort the levels of transcripts in the original sample. I mentioned a technique in week one, something called unique molecular identifiers, which can potentially solve this problem of having to amplify things in the sample. So, I won't go over it again here, but that's one potential solution to get around this amplification problem. The second limitations is that they have practical limits and read length. Because you're having to detect what base pair is being added in a slow way. Where you need to detect some product of the reaction and then feed in more nucleotides one at a time. This really limits the length of time of the assay and therefore, how many base pairs can be read off? A typical allumina runs are 50 base pairs to 100 base pairs. So you can have a lot of problems then in uniquely mapping long repeat regions of the genome, because there's not a lot of differences then between 50 and 100 base pair reads. So if you have gene duplication events, or if you have pseudo genes that look like other genes, it can be very difficult to uniquely identify what those are based on the short replay. So in order to solve those problems, currently so called third-generation sequencing are starting to mature a bit. And there's not many of them now. There's a Pacific Bio Sciences is currently the market leader in third-generation sequencing technologies. And the way that this works is through real time sequencing. As I was describing for the second generation technologies, although these were sequencing by synthesis, they required a discrete step at each base pair incorporation in order to detect what had just happened, and then allow the next addition of a base pair to happen. And then the Pacific Biosciences machine, they actually monitor the growth of a DNA template strand in real time. The way that it works is that they immobilized DNA polymerases onto a surface, and then the template DNA combined to that DNA Polymerase which then incorporates fluorescently labeled nucleotides in real time. So essentially the way that this works is just by having very, very good and fast optics which allow the detection of the fluorescently labelled nucleotide, which has happened to come on to the DNA polymerase at this time. And then, looking at that small pulse of fluorescence, when a proper base pair occurs, you have a larger lifetime of the fluorescence being there. And then the polymerase incorporates that on the order of base pairs per second. So this is much, much faster than the second generation technologies. So a little bit more detail on how this actually works is that you have DNA polymerases in each of these little spots. And you have a type of fluorescence detection which is only looking at 100 nanometers above where these DNA polymerases are attached. So then what happens is that this DNA polymerase is copying the template DNA which is bound to it. And they have special nucleotides in solution which, instead of having three phosphates, have six phosphates. And the fluorescent tag is on the very end of all these phosphates. So the DNA polymerase is copying the template, it needs the next base pair, which is diffusing from the solution. And when the proper base pair comes down, the base pairing interaction, the hybridization between the two base pairs, is relatively stable, compared to incorrect ones. So the stable one, then, has a much longer average lifetime of being on that strand, which can be picked up then, by the fluorescence in long pulse, and fluorescence in real time, by this machine. And depending on the color of that fluorescence pulse, you can tell which base pair was spending a lot of time there. And therefore, which one was much more likely to have been the correct base. So this is an example of what some actual data might look like coming off this machine. And as you might expect, because it seems to be detecting single molecules, there's actually a lot of noise, and the intensity and the length of time it takes each fluorescence pulse to develop. But one really powerful thing about this technology is that you can actually detect base modification. So sometimes, DNA bases are methylated, for example. And in that case, these DNA polymerases, which are used in the Pac Bio machine, you can detect these types of methylations because it takes a lot longer for the DNA polymerases to incorporate that into the growing chain. So for example here, there's some data showing on the right where if you have a methylated adenine, that it takes a lot longer to incorporate than if it's not methylated. But also here, looking at some of this data, you can see that this machine, because it's a single molecule and real time detection, that sometimes there might be quite a bit of difficulty in calling the right base. For example, here if we look at this bottom plot, and look at the cytosine residue that was called here. It's actually a very, very small pulse in a very short duration. So, even though there might be a cytosine residue being incorporated here, it may just be kind of a random pulse that the cytosine happened to come into the field of view of the imager for just a moment and then came back out. So it's because of these noisy kind of single molecule events happening why this third generation technology, this single molecule real time method is actually quite noisy and has a very high error rate. For the next lecture we're going to be talking in much more detail on a second generation technology, one that is used very commonly, based on the protocol and machine that's built by Alumina.