DNA methylation is one of many epigenetic marks that are often measured with high throughput technology, such as next-generation sequencing or microarrays. So we're going to talk a little bit about the analytical pipeline. So the first thing is, what is DNA methylation? Well, DNA methylation refers to a particular methyl group binding to CpG sites in the genome. A CpG site is a C followed by a G so you can see here is a CpG. And here's another CpG, and another CPG, and those are bound by this molecule here we've shown in cartoons with this methyl group. And so the idea is where can we identify those places where that binding is occurring? And so there are multiple technologies using next generation sequencing and microarrays to measure this, and so I'm going to talk about two of those. One is bisulfite conversions followed by sequencing. So the idea here is you split the DNA into two identical samples. So you basically randomly create two samples from the same set of DNA. And then you bisulfite convert one of the two samples, and so that basically converts cytosines, that are not methylated, into uracils. And so then you can align these back to the genomap that they've been sequenced, and you have to account for the fact that you've had this conversion that's happened, so this makes the alignment process a little bit more tricky. But once you do that you can compare the converted samples to the not converted samples and see do you see more of one or the other as a measure of how much DNA methylation is at that location. Another way that this is often used is actually through aluminum methylation arrays or other types of methylation arrays. And so the idea here is you, similarly, do this sort of bisulfite conversion step, but then you do hybridization to a microarray. And then after hybridization to a microarray, you have both the probe that corresponds to the case where you have an unmethylated probe and a methylated probe, and you measure the intensity of both of those. And then you use that same information to try to estimate how much methylation is happening at a particular locus. So the first step in either of these processes, whether you use bisulfite sequencing or DNA methylation arrays, is to normalize the samples. And so you have to process a couple of different things. First, you want to be able to detect whether there was methylation that was at that locus. You have to compare the methylated DNA to the unmethylated DNA, whether that's through the bisulfite conversion comparison with the sequence samples, or with the hybridization signal. And so there's a couple of packages that you can use to do this, the charm package and the minfi package. The minfi package specifically deals with both bisulfite sequencing and microarrays, and charm focuses more exclusively on the microarray version. Now, I'm talking about microarrays here although we primarily talk about sequencing throughout the class because still a large number of studies that are performed in DNA methylation are focused on using microarray technology, especially for large samples. So the next step is smoothing. And so often you see DNA methylation data that look like this where you measured it across the genomes. So this is genomic location on the x-axis, and this is sort of a methylation measurement after normalization on the y-axis. And you can see that it sort of jumps around, and so the idea is you want to find sort of clumps of points like this that are above, at a particular level that they're highly methylated. And so, the way to do that often is to smooth the data and then identify bumps or regions that are differential, or that are methylated. And so, you can do that with the charm or bsseq package. And then you want to do region finding. So once you've done that smoothing, you want to basically identify regions that are say different, between different, in this case it's going to be different categories of tissue, and so it's brain versus liver versus spleen. And you can see that the smooth curve through each of these is a little bit different, and so you basically have to fit a statistical model to identify those regions, and then maybe label them with where they are. And so, you can do that, again, with the charm or bsseq packages depending on what type of data that you have. And then the next step is you want to annotate them. So you basically want to be able to annotate the regions to different components of the genome and in particular, often for DNA methylation, there are these particular categories that are specific to DNA methylation, including sort of CpG islands or places where you see multiple CpGs that occur right next to each other. Or you find sort of the shore, which is sort of the region right next to the CpG islands, and then you have other sort of outside of those regions that are also measured. Now, the DNA microarrays, obviously there's a bias towards particular types of probes that are measured in particular regions. You have to account for that when you're doing annotation, while the bisulfite sequencing is a little bit less biased towards particular types of probes. So the next step is to annotate them and identify what those regions are. So that's a little bit about DNA methylation analysis.