Enrichment analysis can be useful for summarizing results that are statistically

significant for a number of different things, not just for gene sets.

So here I'm going to talk about an example where we're looking for

enrichment when we compare, for example,

two particular sets of results and see if they're enriched for one another.

And so as an example of this we're going to be looking at SNPs,

Single Nucleotide Polymorphisms.

And so we're going to be looking at SNPs that are labeled with one of two

different labels for two different analyses.

So first we are going to look at eQTLs, Expression Quantitive Trait Loci.

We'll talk about what those are later in the class, but for

now just consider that one analysis has been done and one set of labels.

And then a second is a set of SNPs that have been implicated when

genome wide association studies have been done.

So the SNPs that have been implicated in genome wide association studies, and

the SNPs that have been implicated as eQTLs,

we want to know if they are enriched with respect to one another.

So one thing that you could do is you could look, for example,

if you count the number of eQTLs that correspond to

SNPs that are in a particular set of GWAS hits or GWAS SNPs.

And so here that's that number at a particular P value cut off.

One times ten to the negative four.

And so then what you could do is you could take another randomly selected subset of

SNPs that aren't GWAS hits and count the number of eQTLs.

And if you do that for different random selections you get this distribution here.

And so you can see that this distribution in general appears to be fewer

eQTLs than what you get in the observed set of samples.

So again you're doing a permutation scheme to try to identify

if there's an enrichment of eQTL among GWAS hits.

So another thing that you can do is you can make this two by two table.

So we count the number of genes that have an eQTL and are GWAS hits.

And then we can count the number of genes that don't have any eQTL but

are also GWAS hits.

We can also look at the case where there are no GWAS hits,

the number that are in eQTL.

And then no GWAS hits, the case that aren't eQTL.

And then we calculate, are these two things independent of each other?

In other words, is being an eQTL independent of being a GWAS hit?

You can do this with Fisher's exact test or chi-square test.

Those are statistical tests that you could use.

You could also use the logistic regression technique that we learned

previously in the class.

But what people typically do is they permute the samples or, again,

they produce permutations.

So, here, they're not necessarily permuting the samples.

They're permuting the set of SNPs that they get that are associated with

GWAS hits, or they resample from the total possible set of steps.

And so then they're trying to look for enrichment that they've observed.

When they do that,

it's actually quite complicated because they have to account for the fact that,

for example, it's much more likely to get a GWAS hit if you have a higher minor

allele frequency because you'll have bigger power to be able to detect it.

They have to do this permutation or

this resampling within levels of minor allele frequency.

So this is a common problem that you run into when you're doing the source of

enrichment along the genome.

You find that genomic features are usually clustered together or

they have common properties.

And so you have to take those into account when you're doing this analysis.

This is one example of a package that attempts to take into account some of that

spatial structure that's involved in genomic enrichment.

But there's also a number of other issues that you have to deal with.

And so getting the null right is really hard in this case.

In this case you want to know what is the case where we