A multi-institute research team, which includes Carnegie Mellon Dept. of Statistics & Data Science faculty and graduate students, has published the largest study to date for whole-genome sequencing in autism in which they discovered tens of thousands of rare mutations in noncoding DNA sequences and assessed if these contribute to autism spectrum disorder.
Kathryn Roeder, the UPMC Professor of Statistics and Life Sciences, and Ph.D. students Kevin Lin and Lingxue Zhu, were among the researchers who used cutting-edge statistical models to analyze data from 1,902 families comprised of both biological parents, a child affected by autism and an unaffected sibling.
Entitled “Genome-wide de Novo Risk Score Implicates Promoter Variation in Autism Spectrum Disorder,” and published Dec. 14 in Science, the study is one of 13 released Dec. 14 as part of the first round of results to emerge from the National Institute of Mental Health’s PsychENCODE consortium – a nationwide research effort that seeks to decipher how noncoding DNA contributes to psychiatric diseases such as autism, bipolar disorder, and schizophrenia.
For years, scientists have used genome-wide studies to find common variants that confer disease risk.
The research team focused on creating a computational framework capable of finding rare, high-impact variants associated with a human disorder, looking across all the noncoding regions of the genome.
Scientists representing Carnegie Mellon University, UC San Francisco, University of Pittsburgh School of Medicine, Massachusetts General Hospital, Harvard Medical School and the Broad Institute led the research team.
Over the past decade, scientists have identified dozens of genes associated with autism by studying so-called “de novo” mutations — newly arising changes to the genome found in children but not their parents. To date, most de novo mutations linked to autism have been found in protein-coding genes. It has proven far more difficult for scientists to identify autism-associated mutations in noncoding regions of the genome.
“Protein-coding genes clearly play an important role in human disorders like autism, yet their expression is regulated by the ‘noncoding’ genome, which covers the remaining 98.5 percent of the genome and remains somewhat mysterious,” said Prof. Roeder
Yet little is known about the role of mutations in noncoding regions, including whether they contribute to childhood developmental disorders, which noncoding elements are most vulnerable to disruption, and the manner in which information is encoded in the noncoding genome.
“Because the genome comprises three billion nucleotides, identifying which portions of the noncoding genome, when mutated, enhance the risk of autism is as challenging as looking for a needle in a haystack,” said Prof. Roeder.
Using a novel bioinformatics framework, the researchers were able to compress the search from billions of nucleotides to tens of thousands of functional categories that potentially contribute to autism. Working with these categories, they used machine learning tools to build statistical models to predict autism risk from a subset of the families in the study. They then applied this model to an independent set of families and successfully predicted patterns of risk in the noncoding genome.
Though rare de novo mutations were found in many noncoding regions of the genome, the strongest signals arose from promoters — noncoding DNA sequences that control gene transcription. These risk-conferring promoters were most often located far from the genes under their control. They were also found to be largely conserved across species, suggesting that any rare mutations that might arise in these promoters are more likely to disrupt normal biology.
The team’s findings have practical implications for future research on model organisms, like mice, as attempts are made to move toward genetically informed therapies for autism. But the value of studying the noncoding genome extends well beyond autism.
“We were particularly interested in the elements of the genome that regulate when, where and to what degree genes are transcribed. Understanding this noncoding sequence could provide insights into a variety of human disorders,” said Bernie Devlin, professor of psychiatry at the University of Pittsburgh School of Medicine.
“We are just scratching the surface of what there is to learn about noncoding regulatory variation in human disease, and the new methods this team has developed will catalyze an important step forward into larger and more comprehensive studies,” said Michael Talkowski of Massachusetts General Hospital, Harvard Medical School and the Broad Institute.
The Science article concludes that de novo mutations in the noncoding genome contribute to autism spectrum disorder (ASD).
“The clearest evidence of noncoding ASD association came from mutations at evolutionarily conserved nucleotides in the promoter region. The enrichment for transcription factor binding sites, primarily in the distal promoter, suggests that these mutations may disrupt gene transcription via their interaction with enhancer elements in the promoter region, rather than interfering with transcriptional initiation directly,” the article states.
The National Institutes of Health, the Simons Foundation Autism Research Initiative and the Broad Institute’s Stanley Center for Psychiatric Research provided funding for this research.