713
B. Devlin, Kathryn Roeder and Larry Wasserman
Individuals who share a disease mutation from a common ancestor
often share alleles at genetic markers adjacent to the mutation even
if the common ancestor is remote. The alleles at these adjacent
markers, called the haplotype, can be visualized as a string of
realizations of random variables, which may be dependent when
individuals are related in some fashion. Ideally, for a sample of
individuals all having the same (genetic) disease, this dependence
- measured as haplotype-sharing - will be greater in the vicinity
of disease genes than in other regions of the genome. In this paper
we present a semiparametric test for haplotype-sharing. We begin by
developing a model assuming that the ancestral haplotype is known
and thus the extent of haplotype-sharing from a common ancestor can
be determined unambiguously. The amount of overlap at markers far
from the disease is treated as a random variable with an unknown
distribution F, which we estimate nonparametrically. Overlap of
markers surrounding disease genes are modeled as a mixture
, in which p is the fraction of subjects
with the disease mutation. Testing for a disease gene then amounts
to testing whether p=0. Next we drop the assumption that the
ancestral haplotype is known. To detect excess clustering of
haplotypes, we measure the pairwise overlap of a set of haplotypes.
As in the simpler scenario, this distribution is modeled as a
location-shift mixture. To test the hypothesis we construct a score
test with a simple limiting distribution.
Keywords: Haplotypes, Kernel Density Estimate, Linkage disequilibrium, Mixture Models, Score Test.