Hello, I'm Bruce Weir from the Department of Bio Statistics at the University of Washington. I'm going to be talking about some basic issues in population genetics. We're going to focus on some ideas and how we use data to translate those ideas into making inferences about the populations we study or about the traits we study. May be trying to find out the genetic association between a genetic marker and a human disease. We'll start with the Hardy-Weinberg Law, which is the foundation of population genetics. Hardy-Weinberg Law is a result that was published in 1908 by two people, Hardy and Weinberg. Weinberg was a German physician who wrote a very long and very good paper about the biology issue surrounding what we now call the Hardy-Weinberg Law. G. H. Hardy was an English mathematician who certainly was not interested, particularly in genetics at that point. He wrote a little paper in science in the US and apologized upfront for taking space in the journal and bringing this, what he thought a very trivial result to the reader's attention. It was a very important result and arguably it's his most important publication. What he said was that if we've got three genotypes at a gene with two alleles, though the genotype proportions remain constant over time, as do the allele frequencies. What is the Hardy-Weinberg Law? At one level it's just a piece of algebra, but biologically it's very important. It says the population we're studying is very large, technically infinite. Individuals within the population make completely randomly. There are no evolutionary forces. There's no natural selection favoring one genotype over another. There's no mutation changing one allele into another. There's no migration into or out of that population. It's an idealized situation. It's certainly not true, but it's very convenient of how data can be shown to obey the law resulting from those assumptions. If those assumptions do hold, then the allele proportions, little pA, little pB for alleles A and B, and the three genotype proportions, capital PAA, AB, and BB, those both remain constant over time. The allele frequencies are not changing because there's no evolutionary processes causing them to change. More importantly, though the law is the statement about pairs of alleles. Much of what we are going to be talking about here is indeed statements about two alleles. Under the conditions of Hardy-Weinberg, the chances that two alleles are both the same type, capital A for example, does not depend on whether they're in the same individual or different individuals. A consequence of that statement is that the proportion of a homozygote in the population is just the square of the allele proportion. The homozygote is resulted from the union of two alleles of that type and both those alleles are joined randomly from a very large pool of alleles with proportion small p. Heterozygote proportions, just the product of the allele, the two different alleles proportions A and B, and effective too, because it doesn't matter as far as the law is concerned whether the A comes from an individual's mother and B from the father or vice versa. We generally don't know the parental origin of two alleles when we observe a genotype and nor does it matter, and it doesn't affect the slower. The Hardy-Weinberg Law in Summary states that allele and genotype proportions remain constant over time, and moreover the genotype proportions are squares and products of the allele proportions. What is the Hardy-Weinberg Law imply? As I say it's simply that we can reduce the dimensionality of the data. We'll be talking about SNPs, variance with only two possible alleles positions in the genome that are occupied by two different alleles. There are only three genotypes. Reducing those three numbers down to the two allele numbers might not seem like a huge advance, but I can certainly simplify analyses and simplify computational when we get into large-scale studies. If we have a SNP and we have the two alleles labeled A and B. Then under Hardy-Weinberg we can code the genotypes, instead of assigning them their names, AA, AB, and BB, we can give them a number, a score. The score typically is 0, 1, or 2, it just counts the number of one of the alleles. Our data becomes reduced, so the series of allele scores. A single digit will serve to summarize the genotype at SNP. When does Hardy-Weinberg get equilibrium genotype proportions, squares and products of alleles proportions. Word say there was not Hardy-Weinberg. That law does not hold. How can we still describe the genotype proportions using the allele proportions? How can we indicate a departure from Hardy-Weinberg? We've got three genotype proportions to describe and those three numbers add up to one. We say there are two degrees of freedom. We can choose two other proportions freely, providing they're not negative or larger than one of course. But then the third one comes from the constraint that the three should add up to one. There are two degrees of freedom for genotypes. For alleles there are two proportions, and both those must be non-negative and not greater than one. But those two must add up to one, so there's a constraint. We say it is one degree of freedom. Logically, we can't describe a system with two degrees of freedom, when there's only one parameter. We can't describe the three genotype proportions with only one allele proportion unless there's Hardy-Weinberg. If there's Hardy-Weinberg, then we have the law which makes an in depth translation. When there's not Hardy-Weinberg, we need one more piece of information. We need to restore that missing degree of freedom. There are many ways to introduce another parameter to quantify departures from Hardy-Weinberg. The one we'll use here and a very convenient one is called the within population inbreeding coefficient, written as lowercase f. What we do is to write down the genotype proportions as the Hardy-Weinberg proportion plus or minus a deviation reflecting the departures. The deviation is in terms of the inbreeding coefficient little f, and the product of the allele proportions. You can see those three equations there. Those three equations have two allele proportions, one degree of freedom there, and one inbreeding coefficient. The same f applies to the three genotypes. What is f? We may see later on discussion about f being a correlation coefficient. In a sense, it's a correlation of the two alleles. Technically, it's the correlation of a number that we attached to those alleles. We might, instead of calling A and B, we might say quarter one and zero. Then the correlation of the two alleles in individual receives, and the allele carries that correlation as the inbreeding coefficient. Because there's a correlation, it's confined to the range minus one to plus one. Although the lower limit on the negative side is actually a function of the allele proportions. Hardy-Weinberg equilibrium corresponds to little f being zero. The way we've defined it, we are not constraining f to be positive. It can indeed be negative. How much negative depends on the allele proportions. If the two alleles were equally frequent, PA and PB, both being a half, then f could be a small f minus one. When f is negative, there are more heterozygotes than we expect and fewer homozygotes. Then we expect under Hardy-Weinberg, when f is positive, which is the more usual situation, they may get more homozygotes than we expected if it were heterozygotes.