[MUSIC] I'm Goncalo Abecasis. I'm a researcher at University of Michigan where I work on the analysis of genetic association studies. And also in developing tools to make it easier to analyze genetic data. For the next little series of lectures, I'm going to be talking to you about basic quality control checks for human genetic studies. And it might seem like a little bit of adult topic, but as you'll see, there's fun things that show up. And it's also pretty essential. I always like to start by taking a minute to just think about how human genetics has evolved over the past ten, 20 years. And one way to measure is to say that every year we now have about four times more genetic data than the year before. If you count the combination of the number of samples we're analyzing and the number of genetic markers that we're measuring. This has been really good in many ways. We now have hundreds of genetic variants associated with many, many traits ranging from diabetes, schizophrenia. From how tall you are to what your blood pressure is with many, many other things in between. As you all know, there's still many challenges in figuring out the meaning of those associations and really connecting them to specific jeans and disease mechanisms. One of the workhorses of these discoveries are genome wide association studies. Which typically look at millions of markers and thousands to millions of samples. And for the next little bit, we're going to talk a little bit about those studies about some of the challenges, and some of the quality control steps there. And one of the things I wanted to point out as we started is that sometimes we now think, it's not so exciting to find just all these associations signals. It's kind of boring. There's so many of them, they're in a big sea of things. But it's also good to look back and see how far we've come in human genetics. If you took for example, this review from 2002 by Joel Hirschhorn and colleagues. They actually said, hey, let's look at association studies that have been done at least three times. Same trait, same genetic marker, and see if they gave consistent results. They found 160 such studies. And they said, in the end, only six of those gave consistent results. So you think that the quality of genetic association studies is pretty low. Now, since we switched to genome wide studies, what we see is much more promising. You see this progression. I picked the example of macular degeneration. It's right where I work, where in 2010 we had six or seven loci. In 2013, you're repeating analysis with more samples, and there's even more loci. And in 2015, in a yet bigger analysis, now reaching over 30,000, individuals, there's yet more signals. You could zoom into the signals and look at the jeans they contained. You learn things like for example, complement pathway jeans are very likely involved in macular degeneration. In the top six or seven loci, five of them are compliment jeans, or compliment factor H, compliment factor B, compliment two, complement three, etc. So this is what we would like to see in any genetic association study, and we would say, we start with the global view and then we zoom in to these individual regions. Here we have a locus near common factor H with many associated variance, several genes there. And it might be at the beginning, hard to guess which of these is the causal gene. But here we have several clues, and the top variant is often one of the coding variance in CFH. And we know that there's other complement pathway jeans and other loci. As we look around the genome, here's another one of the loci that's very compelling story to tell. Right next to that peak, you see the gene called VEGFA. And it turns out that VEGF is the target of the most efficacious drugs to treat macular degeneration, which basically stopped disease progression in health of patients. As you say, I know what's going on, the study was worthwhile. And here's another locus. This one is a little more challenging because you have a big sea of genes near the peak. You could make a compelling case for several of the jeans. Some people have said that its compliment two, others have said it's complement factor B, others have said it's SKY2VL. And this is a debate that's ongoing, and we know that to really make sense of it, we need to either do further experiments to test these genes in a mouse. Test these genes in a model system where we can understand their function a little better. Or maybe take a global view that integrates other sorts of information besides genetic association. What I was trying to do there is getting in the right frame of mind to thinking about genetic association studies that things one would like to do with them. What they look like when they work. And next, I wanted to show you that sometimes things don't go quite as expected, so here's another genetic association study. One that we actually executed recently. But what you'll see is that the signals are quite strong if those are logged values on the scale. And there's ten to the minus 15s, ten to minus 20s, but they're all over the place. And you really would have very little confidence that you found anything in particular. And you think that this is mostly looking at noise, and it's interesting to think about what are the sorts of things that can get you into this situation. First, you could prevent it, and second, if you happen to be in a study that ends up in a spot like this so that you'd know how to get out of it.