## Genomic control for association studies: A semiparametric test to detect excess-haplotype sharing

June, 2000

Tech Report

### Author(s)

B. Devlin, Kathryn Roeder and Larry Wasserman

### Abstract

Individuals who share a disease mutation from a common ancestor often share alleles at genetic markers adjacent to the mutation even if the common ancestor is remote. The alleles at these adjacent markers, called the haplotype, can be visualized as a string of realizations of random variables, which may be dependent when individuals are related in some fashion. Ideally, for a sample of individuals all having the same (genetic) disease, this dependence - measured as haplotype-sharing - will be greater in the vicinity of disease genes than in other regions of the genome. In this paper we present a semiparametric test for haplotype-sharing. We begin by developing a model assuming that the ancestral haplotype is known and thus the extent of haplotype-sharing from a common ancestor can be determined unambiguously. The amount of overlap at markers far from the disease is treated as a random variable with an unknown distribution F, which we estimate nonparametrically. Overlap of markers surrounding disease genes are modeled as a mixture $$pF(x-\theta) + (1-p)F(x)$$, in which p is the fraction of subjects with the disease mutation. Testing for a disease gene then amounts to testing whether p=0. Next we drop the assumption that the ancestral haplotype is known. To detect excess clustering of haplotypes, we measure the pairwise overlap of a set of haplotypes. As in the simpler scenario, this distribution is modeled as a location-shift mixture. To test the hypothesis we construct a score test with a simple limiting distribution.