By Jillian Rose Lim, Barbara Drosey, Sharlene George, and Nancy McCann
Researchers exchanged big ideas about big data at the 2019 Scientific Symposium, an event that brought together the bright minds of our Children’s Hospital of Philadelphia research community. A lineup of thought-provoking speakers from CHOP and the University of Pennsylvania shared presentations corresponding with the symposium’s themes, “Big Data” and “Today’s Discoveries and Tomorrow’s Possibilities.”
“The goal [of this symposium] is to highlight the tremendous advances by CHOP investigators in the booming fields of computational biology, data science, and genomics,” said Yi Xing, PhD, chair of the event and director of the Center for Computational and Genomic Medicine at CHOP.
Beverly Davidson, PhD, director of the Raymond G. Perelman Center for Cellular and Molecular Therapeutics began by presenting the 2019 Faculty Mentor Award to Babette Zemel, PhD, research professor in the Division of Gastroenterology, Hepatology, and Nutrition.
Dr. Zemel echoed this sentiment and emphasized the importance of mentorship: “We need to prioritize and focus on mentorship in order to train the next generation and keep the legacy that we’ve established here at CHOP going,” said Dr. Zemel. “I have the most amazing, talented group of mentees that come from so many different areas, such as hematology, cardiology, psychiatry – it knows no bounds. It’s because of their creativity and brilliance that I love mentoring at CHOP.”
Power of ‘Networks’ in Pediatric Research
To kick off the morning sessions, Kai Tan, PhD, associate professor in the Department of Pediatrics at CHOP, presented a stimulating lecture on how researchers can make sense of big data sets using molecular networks as powerful tools to unravel the basis of human disease.
“Diseases arise from perturbed molecular networks,” said Dr. Tan. “By understanding these perturbed networks, we can better understand the disease mechanisms and move toward novel treatment approaches.”
Exciting challenges lie ahead for researchers who integrate interactome data with multi-omics to further precision medicine, including diagnosis and therapy. Interactomes are the set of molecular interactions that occur within a particular cell, and researchers have been mapping and collecting data on these interactions in different species. Multi-omics describes a scientific approach that analyzes multiple sets of “omics” data from the genome, proteome, transcriptome, epigenome, and microbiome.
Dr. Tan talked about some of the technical advantages of using molecular networks in analyzing multi-dimensional omics data, as well as the power of using disease-specific regulatory networks to identify noncoding risk variants, such as in the ARVIN study that came out of Dr. Tan’s lab.
Taking Genome-wide Association Studies a Step Further
Following Dr. Tan’s presentation, Struan F.A. Grant, PhD, co-director of the Center for Spatial and Functional Genomics, spoke on advances in variant to gene mapping via 3D genomics for common pediatric traits. Genome-wide association studies (GWAS) have revolutionized the study of genetics in disease, helping scientists to identify genetic variants associated with a particular disease. However, GWAS can only report genomic signals associated with a given trait, not the exact localization of the culprit gene.
Dr. Grant described how his team takes GWAS analysis one step further by identifying how to convert GWAS signals into the correct underlying causal genes in the loci, or DNA regions. His lab has already taken the approach to childhood obesity, type 2 diabetes, osteoporosis, and he recently received National Institutes of Health funding as a co-principal investigator to do so on Alzheimer’s disease and sleep.
Using Big Data to Measure Risk Relationships Among Diseases
David Hill, MD, PhD, attending physician in the Division of Allergy, opened his lecture with a quote from renowned French biologist, Louis Pasteur. “Nothing can give more happiness than increasing the number of discoveries, but his cup of joy is full when the results of his studies immediately find practical application,” said Dr. Hill, explaining that these are the goals of his lab, which is set to begin work in July.
Dr. Hill spoke on his work studying diversity among allergic march trajectories, in which big data plays a large part. Data sets from the CHOP Primary Care Virtual Birth Cohort, on more than 150,000 children between 2001 and 2018, has allowed Dr. Hill and his colleagues to pull a wealth of information, such as disease acquisition based on ICD codes, allergen information, and medicine prescriptions, and analyze it to identify patterns in the allergic march. Moreover, the data allows researchers to measure risk relationships among diseases in the allergic march, particularly atopic dermatitis (AD), which is considered the “gateway” to the allergic march.
“Not every patient in the cohort will go through every single manifestation,” said Dr. Hill. “So, what are some of the features of one patient going down one path and another? We’re using big data in a different way.”
Importance of Consortia-Based Approaches
When he joined Children Hospital of Philadelphia in 2007, Adam Resnick, PhD, envisioned implementing a process of convergence research focused on novel approaches to pediatric brain tumors. Now director of the Center for Data Driven Discovery in Biomedicine (D3b) at CHOP Research Institute, Dr. Resnick discussed its infrastructure designed to optimize consortia-based approaches.
“Data in isolation is limited,” Dr. Resnick said. “No single institution could achieve an accelerated discovery process. The connectivity and flow of information must be harnessed to be utilized. This is why consortia-based approaches are the lifeblood of pediatric research.”
To this end, Dr. Resnick serves as scientific chair for various consortia, among them the Children’s Brain Tumor Tissue Consortium (CBTTC) comprised of 16 primary institutions in Australia, China, and the United States. CHOP serves as the operations center for centralized biospecimens and large-scale integration of genomic and molecular data. In September 2018, the CBTTC Pediatric Brain Tumor Atlas released the largest collection of pediatric brain tumor data, available to the public and searchable through the CHOP-led, National Institutes of Health-funded Gabriella Miller Kids First Data Resource Center portal.
Dr. Resnick shared a video of Gabriella Miller, who later died from diffuse intrinsic pontine glioma. “It turned out that within [Gabriella Miller’s] disease, one-fifth of these patients share a mutation that has nothing to do with cancers,” Dr. Resnick said. “Until data was shared in a common environment in a harmonized way, it took years to pull this together. With a disease with nine-month survival, years mean lives.”
‘Every Scientist a Data Scientist’
Jeffrey Pennington, associate vice president and chief research informatics officer, CHOP Research Institute, shared an update of Arcus, the data management strategy that will link clinical and research data at CHOP to accelerate science. Like Dr. Resnick, he joined CHOP in 2007 and shared his vision of what data can do for research and, ultimately, patients. The platform will enable projects from the simple (e.g., organizing research files) to the very complex (e.g., preliminary investigation of complex phenotypes to enable researchers to link data).
“Arcus allows us to link across datasets and find complementarity between patients,” Pennington said. “Our honest broker status allows us to merge datasets and provide data back to the research community on a de-identified basis.”
Since launching projects with Arcus will be more successful with “every scientist a data scientist,” the initiative is leading with education to help each scientist become an expert in secure, collaborative, reproducible, and computational research. A variety of materials and workshops are available on the Arcus Educational Portal to brush up on basic skills or delve into a specific area of expertise.
Where Clinical Science and Data Merge
Nancy Spinner, PhD, FACMG, Evelyn Willing Bromley Chair in Pediatric Pathology and chief of the Division of Genomic Diagnostics at CHOP, followed Pennington with “Clinical Research Crosstalk in Genomic Diagnostics.”
Along with Ian Krantz, MD, Dr. Spinner leads the CHOP Pediatric Genetic Sequencing (PediSeq) Project that aims to bring novel genetic testing to clinical care. Since the NIH-funded project began in 2011, samples delivered for exome sequencing within CHOP have grown from 10 per month to approximately 60 per month, and from all areas of the hospital.
Dr. Spinner pointed to the development of the Division of Genomic Diagnostics and the Roberts Individualized Medical Genetics Center as evidence of CHOP’s support for the potential that this continually growing field holds for families living with rare genetic disorders.
“We’ll be generating answers and questions that will impact patient care,” she said. “I’m excited for the future.”
Elucidating Transcriptome Complexity Using RNA-seq Big Data and Deep Learning
Yi Xing, PhD, and the Center for Computational and Genomic Medicine at CHOP that he directs, are in the business of finding new ways to ask stimulating biological questions using big data and computing. His hybrid experimental/computational lab accomplishes this is by developing computational methods and genomic technologies to explore the complexity of gene regulation at the level of RNA.
The lab’s main focus is on alternative splicing, which is part of the process that transcribes the same DNA code of a single gene into multiple different RNA isoforms. Interpreting aberrant RNA splicing and other types of RNA-level processes underlying genetic diseases could help inform future diagnostic and therapeutic strategies.
“Those large scale datasets create unprecedented opportunities for us to investigate the patterns of alternative isoform variations and link those variations to genotypes and phenotypes, but it also creates a lot of interesting computational problems,” Dr. Xing told the Symposium audience.
Dr. Xing described a new tool called deep-learning augmented RNA-seq analysis of transcript splicing (DARTS) that uses a deep neutral network trained on massive public RNA sequencing data to identify differences in alternative splicing between biological conditions. One of the lab’s most recent projects used RNA sequencing data across the human population to discover genetic variants that affect alternative splicing in the human brain and elucidate their associations with neurological traits and diseases.
Intellectual Functioning and Genetic Variance
Laura Almasy, PhD, of the Department of Biomedical and Health Informatics at CHOP and professor of Genetics at the University of Pennsylvania, enlightened the afternoon audience with her presentation, “Layering Phenotypes to Improve Power and Interpretation in Gene Localization Studies.”
Working with a sample of the Philadelphia Neurodevelopment Cohort which is nearly 10,000 kids ages 8 to 21 years who received medical care within CHOP’s network, she set out to analyze neurocognitive traits and their heritability, and if genetic influences on cognition and psychiatric symptoms vary across development. Dr. Almasy’s novel findings suggest there are shared effects between externalizing and general intellectual functioning, memory traits, and emotion recognition traits.
“We also show that genetic influences on general intelligence, general psychopathology, and externalizing behaviors are increasing throughout childhood and early adulthood which potentially has implications for gene localization studies,” Dr. Almasy said. “Possibly we can improve our power to find some of our genes of interest by allowing for the genetic variance to change with age.”
Driving Precision Medicine
Creating and turning data assets into insights is the benefit of big data.
“It’s a powerful decision-making resource,” said Hakon Hakonarson, MD, PhD, director of the Center for Applied Genomics (CAG), and by leveraging it, “it can enable better treatments and drugs, reduce medical costs, deliver optimal health care outcomes with fewer risks, and provide better selected disease targets.” In other words: precision medicine.
Attendees learned about CAG’s exciting research and discoveries — from induced pluripotent stem cells and stem cell therapy showing early promise for effective therapies, to targeted T cell therapies, CRISPR gene editing, and single cell sequencing — made possible, in part, by CAG’s biobank with samples from more than 450,000 unique patients.
“Research is enabled by our unique, scalable biobank with sample collections that are highly enriched for rare disease causing variants,” Dr. Hakonarson said.
Advantages of Long-read Sequencing
Kai Wang, PhD, associate professor of Pathology and Laboratory Medicine, at the Perelman School of Medicine at the University of Pennsylvania, shared his knowledge of a niche area of genomic technology called long-read sequencing. Conventional short-read sequencing technologies have severe limitations in detecting pathogenic structural variants (SVs), potentially contributing to low diagnostic rates in clinical sequencing studies.
But the new generation of long-read sequencing technologies is able to detect SVs far more comprehensively than short-read sequencing. Together with innovative bioinformatics analysis, long-read sequencing can identify pathogenic SVs missed by short-read sequencing, and even detect traditional SVs that have yet to be sequenced. Long-read sequencing can also detect DNA and RNA modifications, contributing to understanding the epigenetic regulation of the human genome.
“Because of the long-read, this technology can solve some questions that cannot be solved by conventional sequencing,” Dr. Wang said. “We are developing novel methods and software tools to improve our understanding of long-read sequencing data, facilitate genetic discoveries, and accelerate the implementation of precision medicine.”