New Computational Method Sheds Light on How Genetic Variants Alter RNA Processing

Mar 4 2019

New Computational Method Sheds Light on How Genetic Variants Alter RNA Processing

By Sharlene George

A key moment occurs in any gripping novel that sets in motion the characters’ doom or fortune. In the context of gene regulation, RNA-binding proteins (RBPs) play a similar pivotal role determining ribonucleic acid molecules’ (RNAs’) fate by guiding post-transcriptional events. This process is essential to interpretation of genetic code and its function in protein synthesis, which are the building blocks of any organism.

Researchers with the Center for Computational and Genomic Medicine at Children’s Hospital of Philadelphia have gained revealing insights into genetic variants and RNA processing with the development of a computational method called Allele-Specific Protein-RNA Interaction (ASPRIN). Essentially, the tool detects and annotates individual nucleotide changes in RNAs that could potentially influence how they interact with proteins and affect patterns of post-transcriptional regulation of gene expression.

Finding mutations that may affect the binding of these RBPs is crucial because RBPs control important steps in the life cycle of RNA molecules — how they mature, how they are transported out of the nucleus, and how they are translated to assemble the sequence of amino acids in a protein. Defects in how RNA is processed could result in malfunctioning proteins that affect complex traits and diseases or lead to monogenic genetic disorders (caused by a mutation in a single gene).

A study reported in the American Journal of Human Genetics by Yi Xing, PhD, director of the Center for Computational and Genomic Medicine, and Emad Bahrami-Samani, PhD, a postdoctoral fellow, used a data-driven approach to delve into volumes of genomic data generated by the National Institutes of Health’s  Encyclopedia of DNA Elements (ENCODE) Project. ENCODE is a free genomic catalog the scientific community can explore using sophisticated computational methods such as ASPRIN to pinpoint the genomic locations of genes and changes in the regulatory elements that control them.

In the AJHG paper, Drs. Xing and Bahrami-Samani evaluated data generated from common human cell lines, focusing mostly on polymorphisms that are naturally occurring in the human population. The kind of variant that has interactions the researchers observed could be related to common traits such as blood pressure, body weight, or risk of autoimmune disorders.

“We compared two populations of RNAs in the cell,” Dr. Xing explained. “One is basically all the RNA in the cell that could be sequenced. And then the other is a fraction of RNA that is actually bound to a protein of interest. From the sequencing data, we have the individual nucleotide sequencing information of those two pools of RNAs. So now, if we see that a particular nucleotide change is more prevalent in the RNAs bound to the protein but not in the whole population of RNAs in the cell, we could make an inference that a particular change enhances the protein binding.”

The study is an informative illustration of how researchers can use publically available big data to better understand how genetics and DNA variation affect gene regulation. This kind of analysis was only recently made possible by advances in high throughput sequencing technologies. Five years ago, researchers did not have access to such a massive amount of data for both the RNAs and protein-RNA interactions.

The results ASPRIN generated tell researchers what kinds of nucleotide changes could affect binding of a particular RBP and subsequently how the RNA interacts with proteins. This knowledge eventually could help to elucidate the genetic basis of complex traits and diseases, to more fully inform genetic diagnosis, and to possibly advance therapies.

For example, researchers may suggest the processing of a particular RNA is associated with the risk for cancer in the human population. But they may not know which specific variation in the human genome is important to that association. By looking at the allele-specific binding map generated by ASPRIN, researchers could have additional information to figure out what is the causal variant.

“To me, the most exciting part is our potential to look at a set of mutations and point to what could be causal variants affecting post-transcriptional regulation of gene expression, which can in turn shed light on mutations causing disease,” Dr. Bahrami-Samani said.

This conceptual framework for allele-specific analysis could be applied to studying other gene regulatory processes at the RNA level. In the meantime, Dr. Xing and his team are working on technical improvements to refine ASPRIN and reduce noise in the data that could affect the readout of allele-specific interactions.

The ASPRIN source code is available under GNU General Public License version 3.0 and can be downloaded from GitHub.