Welcome back. So today we're going to talk about, in this module we'll be talking about initial temps really to identify the specific genetic variance that underlies the heritability of schizophrenia. Previously in this week we've talked about twin and adoption studies that establish that there are heritable influences on schizophrenia. We don't want to stop there just by saying that her, that schizophrenia is heritable. But we really want to know is, what are the genetic factors that underly that inheritable effect? In the early 1990s, as the Human Genome Project was progressing, a natural question to ask is, well how was the Human Genome Project going to help us understand the genetic basis ff disease? Of human disease? The hu, recall that the human genome project's goal was to sequence a single human genome. A composite genome. But what we really want to know is, what are the genetic variants that exist within our collective genomes that might underly our risk for various diseases. And in 1992, I know this is a fairly complicated looking figure but what we need to understand here is more at a conceptual level what the figure represents. In 1992, Francis Collins, now Francis Collins was the second head of the human genome project within the US and he is currently the head of the National Institutes of Health. He's one of the most prominent human geneticists in the world. And he was asked to articulate how it was that human geneticists would go about identifying the genetic variants that underly human disease once the human genome was sequenced. And he articulated a strategy called the Positional Cloning strategy, and it's illustrated here. But for our purposes, what I'd like to emphasize is that Collins in 1992 was articulating a systematic way of probing the human genome to identify risk alleles. We'd fist begin by determining that a trait or a disease was heritable. And we've done that with schizophrenia. We then try to identify the regions of the human genome, the chromosomal regions that might harbor the relevant risk alleles. We'd identified genes within those chromosomal regions. And eventually, variants or mutations within the genes that actually affected an individual's risk for developing the disease. This is what Collins called the positional cloning strategy, a systematic way of going from heritability to identifying the relevant variants or mutations. And the reason he articulated this strategy is because it had worked. We knew it had worked. This is a graph from the early 2000's. And what I want to emphasize here is this particular curve here, which is the success in mapping single gene or Mendelian disorders. And you can see that, in the beginning in the late 1980s, and then really taking off in the 1990s, human genetics has became extremely successful in mapping Mendelian disorders. Things like cystic fibrosis or Huntington's disease or [UNKNOWN], which we talked about earlier. And at this time, several thousands Men, Mendelian diseases have been mapped in the human genome. These curves down here, and in particular, the one here, shows the ability to map more complex diseases, like, for example, schizophrenia or heart disease or cancer. At least in the early 2000s, progress was fairly modest. But the positional cloning strategy had worked here. So that's what people looking at these more complex traits had adopted. And so people studying schizophrenia, beginning in the 1990s started to follow this strategy that Collins had articulated it's heritable. Let's find the chromosomal region and then the genes within the, that region. And we don't need to get hung up on specifically what they did so much as what we should understand is by the mid-2000s, the psychiatric geneticists had looked at the human genome and had identified about 10 to 12 regions, like this region on chromosome six or eight or chromosome one or chromosome 22. These regions look to be implicated in schizophrenia, and they began to identify genes within these regions that were likely candidates for the genes that were affecting risk for schizophrenia. Things like COMT, or Neuregulin is the one up here on chromosome 8. And really, that set the stage for the paper I've asked you to read, which is the Saunders et al study. And if you hadn't had a chance to look at that, you might put me on pause right now, and just grab that article, look through it, because I'm going to go through the article now because from my perspective what that article represents is a culmination of a strategy that was articulated in the early 1990s. Let's find the variance underlying schizophrenia risk using the positional cloning strategy. And here's, if you've read the introduction of the Sander's paper, what they set out saying is that, we're now we, we now have these positional candidate genes. Let's see whether or not, they're associated with schizophrenia. So what's the methodology they use? First of all, it's a case control study. They have a sample of individuals with schizophrenia, and a sample of controls. The sample of people with schizophrenia is about 1900 individuals with schizophrenia coming from the U.S. and Australia. This is actually quite a large sample in genetic studies certainly for 2008. And in order to get such a large sample, they actually had to combine three different studies. They had about 2000 controls. These were all from the US. They weren't necessarily rigorously screened for schizophrenia, they had a kind of minimal screening for this. But it's not much of a concern. Because even if there's a few people with schizophrenia in the control sample it's not going to mess up their results. There's not going to be very many. One thing I, I'll point out and, and we, we won't really have an opportunity to go into great depth into this in this course, is that all of the individuals in this sample were of European ancestry. And it turns out that, that's important in genetic studies, in human genetic studies. The genetic architecture differs depending upon our ancestral background. It's different for Europeans than for people of African ancestry versus people of Asian ancestry. And so, in order to control for that, usually what human geneticists will do will, at least in the initial stages, restrict themselves to one ethnic group. Here they're looking at individuals of European ancestry. And actually one of the concerns in the human genome project, and especially now that the, the, the genome is sequenced and we're trying to apply it, is that there's been a bias, I don't think intentional, but a bias in focusing on individuals of European ancestry. And there's really a need to study more broadly. We're going to look out a couple schizophrenia studies this wee-, this module and then the next. Unfortunately, they're both b-, based on European samples. They're based on European sam-, samples, because you need to control for ethnicity. It would be nice if other ethnic groups were studied as well. There are other studies, but they're just not as large as the ones we're going to look at here. In any case, Sanders is looking into Europeans. That's a sample, a very large sample to put together. The genes, as they point out in the article, the, the list of 14 genes, and I'm not going to go through the list here, it's not really that important. What's important is to recognize it's the culmination of this process, of 10 or 15 years of research, to identify the genes that psychiatric geneticists thought are most likely to harbor variance affecting schizophrenia risks. These were the positional candidates. They'd come out of previous research. They'd been pretty well documented. In each gene, they actually genotyped multiple genetic markers or snips. Now, recall from a couple weeks ago or maybe it was last week we talked about different types of genetic variance. A SNP is a single nucleotide polymorphism, a difference in the nucleotide base of DNA, one base versus another. And they're actually fairly easy to genotype. This is actually adapted from their Table 2, and I'll just point out that they had 14 genes and they actually genotyped 648 genetic markers, or SNPs, in those genes. And this is how they're spread out over the genes. Some are in the coding region of the gene, the exon, some are in the introns. Some are in the promoter regions. Some are in those 5' and 3' untranslated regions. And some are actually outside the coding regions per se. They totalled 648 SNPs that were genotyped across 14 genes, so there would be maybe about, I guess about, somewhere between 45 and 50 on average SNPs per gene. If you look at one of their tables, it'll tell you how long each of these genes is, some are several thousand bases long, I think the longest is about a million bases of DNA long. And you might recall when we talked about SNPs that they occur about one out of every 300 bases of DNA. They didn't take every snip that they possibly could take in these genes, they selected ones that they thought were most relevant but also had a desirable statistical property. And that property is called tagging. And they refer to this in the article and I just want to try to give you a conceptual understanding of what they meant by tagging. What do they mean by tagging? Suppose we had a region of DNA here where we have three SNPs: SNP1, SNP2, and SNP3. And this is just to designate that this is not drawn to scale. SNP1 and SNP2 are separated by 1000 bases of DNA. One kb. SNP 2 and SNP 3 are, are separated by a mega base, or a million bases of DNA. It turns out that if SNPs are close to one another on the chromosome, it's likely that you can predict one SNP from the other. So in this case, if there are two SNPs that are within 1000 bases of one another, SNP one will probably be able to predict what SNP two's status is. So there's no need to genotype both SNP one and SNP two. SNP one might would probably suffice for both. You can predict or tag or impute, those are all used synonymously here, SNP two knowing what SNP one is. However, when they're far apart, let's say on the order of a million bases here, SNP 1 is no longer likely to predict SNP 3. So snips that are closer to one another on the chromosome can predict one another and you don't need to genotype all of them. You can tag the ones you didn't genotype. So they actually use the statistical procedure of tagging or synonymously inputting to tag 433 other SNPs. In addition to the 648 SNPs they actually genotyped. I'm going to focus on the 648 for our purposes. The, the same results really apply to the 433 they tagged in addition. So let's see what they found. So this is adapted from their table for, and an alternative representation for the results is given in, in their Figure One. I'm not going to reproduce Figure One here. So they tested 648 SNPs, and what are they doing? Each SNP they look at the frequency. Again remember that SNPs are one allele against the a, the, the other, so they, they look at the frequency of the least frequently occurring allele in the group of schizophrenics vs the group of controls. And they do that 648 times and they identify how many times they find a statistically significant difference in the frequency of that allele. When they tested the frequency at a p value of .05, they found 30 times a significant result, out of 648. When the p value was .01, they found three significant results. The question is, are these significant? Did they really find anything? And to understand whether or not they found anything, we need to review a little bit about testing and type one errors and statistics. If you run 648 statistical tests and you test each at a 5% probability level, then by chance, even if all of the null hypotheses are true, by chance, five percent of the 648 tests will be statistically significant at the p less than 0.05 level. Five percent of 648 is 32, so by chance they would expect 32 significant results, at the p 0.05 level. How many did they get? 30. If you test 648 tests at the 1% level, then by chance you would expect 1% of those to be significant at a p-value of less than 0.01, or about six. How many do they, by chance you would expect six. How many do they find? They found three. They found basically what you would expect if the results were purely chance. Even though, they found things significant at 5% level because they were doing multiple test, they didn't find more than you would expect just by chance and similarly for the 1% level. What a disappointment! Can you imagine? 15 years of work, and this was the culmination of a large sample of schizophrenics. I'm sure, I've talked to the researchers that did this study, that they were shocked. This is not what they expected to find. They, they were expecting a breakthrough, that we, we would really find the genetic variants here. Why didn't they find anything? I think it's a really interesting question. Editorials were written when this paper was published in 2008. Maybe schizophrenia's not heritable. Some people have argued that. But I think, in my opinion, the twin, the family, and the adoption study is just too compelling to go back to argue that schizophrenia is not a heritable disease. Maybe these weren't the right candidate genes. They followed the strategy, but maybe these weren't the right ones. Maybe, right, there are many more snips in these regions; maybe they didn't pick the right snips. I don't think that's likely. They did a very good job of picking the snips that made sense, both in terms of previous research as well as statistical properties. Maybe the sample of almost 2000 people with schizophrenia and 2000 controls was not large enough. Well in my opinion, and we're going to talk about this throughout the remainder of the course, the two major reasons they day, didn't find anything is they didn't pick the right candidate genes. Not because of anything they did wrong, but they just weren't the right candidate genes. And secondly, even though their sample by any standard of 2008 was gigantic, it wasn't large enough. Next time we're going to talk about an alternative to the Sanders et al approach that actually emerged after Sanders did their study. Thank you.