Expression quantitative trait loci or eQTL is one of the most common anagrative analyses that are performed in genomics. So an eQTL is an analysis where you're trying to identify variations in DNA that correlate with variations in RNA. So basically what you do is you measure the abundance of different RNA molecules. And measure the DNA in those same samples and then you try to correlate the variation in DNA to the variation in RNA. This is representative of a whole class of problems that are associated with combining different genomic data types. Whether it's measuring proteomic data and RNA data, or DNA data and RNA data, or RNA and methylation. And then trying to integrate those data together to try to identify their sort of cross regulation between these different measurements. So one of the first examples of an eQTL study was this study be Brem et al in 2002 in Science. And they basically crossed two strains of yeast and they created 112 random segregants. And so once they had those yeast segregants, they measured mRNA expression at the time they used gene expression in the microarrays and then they measured genotypes using a microarray genotyping tool. And the goal was to identify associations between the expression levels as well as the genotype levels. And so you can think of this as basically having two components. One is this sort of the SNP data, so that's the marker or SNP associated with each gene in the genome. In this case, it's the yeast genome. And so you have the position of the particular SNP that you're measuring and then you also have information about a particular gene. Like how much that gene is turned on or expressed. And then you have the information on where that gene is located in the genome as well. So, you're basically trying to do an association between all possible gene expression levels and all possible SNP levels. So, this obviously complicates the issue of multiple testing because you're doing all possible SNPs versus all possible gene expression values. So if you think about it as for every single SNP, you're performing basically a gene expression microarray analysis for every single SNP. And if you have thousands or hundreds of thousands of SNPs, that's thousands or hundreds of thousands of micro experiments. And you're basically looking for in cases like this where you see, so in this case, there are the two strains. They have the BY and RM strains, so those are surrogates for the genotypes in this case. And so here, you're looking for differences in expression. So here you don't see any difference or not a very strong difference in expression between the BY and RM strains for this particular gene, for this particular variant. Here for this other variant for this other gene, you do see differences in the mean level of expression between the two genotypes. And so that would be sort of classified as an eQTL if it passed the significance thresholds. And so this is typically the kind of plot that you can make when you do an eQTL analysis, so on the x-axis here, we've got the position of the marker or the genotype. So again, that was where that SNP was positioned in the genome and then you also have the trait position. So that's where the gene expression levels were located at. So basically you can imagine where's the gene that codes for the mRNA that is being measured and where is the SNP that's being measured. So then you'd just line up the chromosomes on each axis and so this circled component right here, this diagonal line represents what's called typically CISeQTL. So CISeQTL are often defined as eQTL where the SNP position is close to the gene expression position. And then there are also what's called TRANS eQTL, now in this case, there appear to be lots of TRANS eQTL. But well, it's often been noticed is that if you see these sort of big stripes of loci that seem to associate with many genes' expression levels. Very often, those tend to be artifacts so it might be a batch effect or some sort of artifact in the data that basically are driving the sort of variability. Now sometimes that may or may not be true. Like if you identify, for example, a biological reason that there might be a large number of associations between a particular locus and lots of genes' expression, that might be true. But typically, your assumption is that it might be an artifact if you see these sort of large stripes in the pattern here where there's a particular marker that's associated with many genes. So this idea is actually really popular right now. It's being used in a whole large number of studies. One of the most recent and very large scale studies of gene expression variation in context of eQTL is the GTEx project. Where they took multiple people, multiple donors, and they took from each donor, multiple tissues and they measured information about their DNA sequence. And they also measured their level of expression in various different tissues, say their brain, heart, and liver, and then they performed eQTL analysis that are both across tissues and within tissues. And so, they've identified a large number of eQTL including sort of cross tissue eQTL. That data is all available and you can start analyzing it yourself if you're interested. And so, eQTL is sort of an area that's here to stay and is probably the most popular of the integrative, the genomic sort of applications. So just some notes and further reading. So the cis-eQTL tend to be more believable than trans-eQTL. So the cis-eQTL being those eQTL where the SNP position or the variant position are close to the coding region of the gene. Then the trans-eQTL where you see SNP position that's very distant from the position of the coding gene. There are many potential confounders here. So in this analysis, usually you have to just like in the sort of a GWAS analysis you have to adjust for population stratification. You have to do that here. You also have to adjust for things like batch effects on the gene expression data just like you would do in a gene expression analysis. And then there's even more complicated things like sequence artifacts. Where a sequence artifact could actually make it look like that there's eQTL, especially a trans-eQTL, when they're not actually there. So this paper I've linked to here is actually an excellent review of many of the issues associated with eQTL analysis if you want to learn a little bit more about that.