B. Devlin, Kathryn Roeder and Larry Wasserman
During the past decade, mutations affecting liability to human disease have been discovered at a phenomenal rate, and that rate is increasing. For the most part, however, those diseases have a relatively simple genetic basis. For diseases with a complex genetic and environmental basis, new approaches are needed to pave the way for more rapid discovery of genes affecting liability. One such approach exploits large, population-based samples and large-scale genotyping to evaluate disease/gene associations. A substantial drawback to such samples is the fact that population heterogeneity can induce spurious associations between genes and disease. We describe a method called genomic control (GC), which obviates many of the concerns about population substructure by using the features of the genomes present in the sample to correct for stratification. Two general approaches to population-based association studies are now available. The GC approach exploits the fact that population substructure generates `overdispersion' of statistics used to assess association. By testing multiple polymorphisms throughout the genome, only some of which are pertinent to the disease of interest, the degree of overdispersion generated by population substructure can be estimated and taken into account. The other approach, called Structured Association (SA), assumes that the sampled population, while heterogeneous, is composed of subpopulations that are themselves homogeneous. By using multiple polymorphisms throughout the genome, the SA method probabilistically assigns sampled individuals to these latent subpopulations. We review in detail GC. In addition to outlining the published ideas on this method, we describe several extensions: quantitative trait studies; and case-control studies with haplotypes and multiallelic markers. For each study design our goal is to achieve control similar to that obtained for a family-based study, but with the convenience found in a population-based design.
Keywords: population substructure, bias, case-control study, overdispersion, latent class model