How Does a New Computational Method Transform Public Big Data Into Knowledge of Transcript Splicing?

Mar 26 2019

How Does a New Computational Method Transform Public Big Data Into Knowledge of Transcript Splicing?

By Sharlene George

The findings:

A new computational framework called deep-learning augmented RNA-seq analysis of transcript splicing (DARTS) uses deep-learning based predictions to add dimension to the wealth of information available in public RNA sequencing (RNA-seq) big data sets. DARTS allows researchers to gain new insights into RNA and protein complexity, particularly for genes with low expression.

Who conducted the study:

A team from the Center for Computational and Genomic Medicine at Children’s Hospital of Philadelphia conducted the study including Yi Xing, PhD, who is the Center’s director, and first authors Zijun Zhang and Zhicheng Pan, who are PhD students.

How they did it:

Dr. Xing’s team developed DARTS as an innovative approach that sheds light on “dark matter” areas of the transcriptome, where transcript isoforms of genes exist at moderate or low levels that are under the radar of conventional deep sequencing methods. DARTS first uses RNA-seq big data in the public domain to train a deep neural network for predicting changes in transcript alternative splicing. The model incorporates messenger RNA (mRNA) levels of 1,500 RNA binding proteins and 3,000 sequence features. To allow researchers to utilize the deep learning model in their own RNA-seq studies, the deep neural network predictions are combined with actual RNA-seq data generated on specific biological samples using a statistical framework called Bayesian hypothesis testing. Researchers can use this information in their individual labs to better characterize alternative splicing across different biological states and conditions.

Why it matters:

Alternative splicing of precursor mRNAs has been demonstrated to provide excellent biomarkers and therapeutic targets for diseases. DARTS gives researchers the ability to detect those kind of events with a better resolution. This is especially helpful in cases where high-throughput RNA sequencing doesn’t provide enough coverage of particular genes researchers are interested in studying. This new prediction tool helps to polish fuzzy data. Given certain features of a splicing event and abundance of RNA binding proteins, researchers can now harness the power of deep learning to predict the behavior of splicing regulation.

Quick thoughts:

“The conceptual innovation of DARTS is it provides a bridge from big data in the public domain to smaller data sets in focused studies with individual investigators,” Dr. Xing said.  “DARTS offers the ability to transform massive amounts of public RNA-seq data into a knowledge base, represented as a deep neural network, of how splicing is regulated. Using this computational framework, we can push that into any individual lab. This could be really useful and increase the efficiency of the experiment and enable new discoveries. With just 20 or 30 million RNA-seq reads, you can make educated guesses and inferences on things you were never able to see in the past.”

What’s next:

Many genetic diseases in children are extremely rare, so applying DARTS to help fully explore and utilize publicly available big data could yield new insights into the transcript splicing variations underlying these diseases and lead to better diagnostic and therapeutic strategies. Dr. Xing’s team is developing a version of DARTS that can predict splicing levels in any given tissue based on personal genome sequencing data and the abundance of RNA binding proteins in that tissue.

The conceptual framework of DARTS also could be applied to other problems of gene regulation, where researchers couple the knowledge of big data sets using deep learning with data specific for biological systems and then use Bayesian statistics to integrate them.

Where the study was published:

The study “Deep-learning augmented RNA-seq analysis of transcript splicing” appeared online March 25 in Nature Methods.

Where to learn more:

DARTS and training modules are publicly available for exploration. For more details on the Nature Methods study, see CHOP’s press release.

Disclosure:

Dr. Xing and study co-author Douglas L. Black of the University of California, Los Angeles, are scientific co-founders of the company Panorama Medicine. Panorama holds a non-exclusive license to the DARTS technology. Dr. Xing and co-author Zijun Zhang have filed a provisional patent application for DARTS.