Harnessing the power of big data and machine learning as a digital crystal ball, 2019 Research Scientific Symposium Keynote Speaker Olga Troyanskaya, PhD, showed the audience how her lab is delving into biochemical activity in the vast non-coding regions of the genome to make predictions about genetic variants’ causal connections to rare diseases and complex common disorders.
A professor at the Lewis-Sigler Institute for Integrative Genomics and the Department of Computer Science at Princeton University, Dr. Troyanskaya wants to find the meaning in the mountain of information genomic databases have compiled in recent years. While each single-nucleotide variant associated with a disease is a stepping stone, she encouraged researchers to reach a greater understanding of specific variants’ function, their effects on gene expression, and how they may influence disease risk and pathophysiology.
“Before we can make precision medicine anywhere closer to reality, we really need to start thinking very deeply about how we can actually interpret all of those single variant changes,” Dr. Troyanskaya said.
Dr. Troyanskaya’s lab focuses on the dynamic interplay between genomics, computational methods, and traditional experimental and clinical data to surmount this complex biological challenge. In order to sift through the 3 billion letters of sequence in the human genome — 98 percent of which are non-coding DNA — her team developed two deep learning methods, DeepSEA and Seqweaver, to identify disease-causing mutations at the single-nucleotide level in transcriptional regulatory sequences and RNA regulatory sequences, respectively. In other words, her team teaches computers algorithms to find important patterns and identify processes that are likely disrupted in a genetic disorder.
Her team also created a framework called ExPECTO that predicts expression levels directly from sequence and is capable of predicting tissue-specific transcriptional effects of sequence variations. Mapping out the functional genomics picture of a cell and merging it with human quantitative genetics data will help to more effectively pinpoint causal variants’ molecular mechanisms, Dr. Troyanskaya said. For example, they are using these methods to elucidate genetic variations associated with autism spectrum disorder (ASD) and predict misexpression of neurodevelopmental proteins in brain tissues of autistic individuals.
As researchers gather information about the non-coding effects of genetic variants, Dr. Troyanskaya anticipates this new knowledge will help to illustrate which protein networks and pathways are interacting in different cell types and tissues at certain developmental stages. Such insights could one day lead to improvements in early diagnosis strategies, provide patient-specific therapeutic targets, inform intelligent drug design, and optimize patients’ individual responses to those therapies. Scientists interested in learning more about the data-driven predictions from Dr. Troyanskaya’s lab can visit the online resource HumanBase.