Novel Approaches to Data Analysis Lead to Discovery at SAGES

Jun 18 2019

Novel Approaches to Data Analysis Lead to Discovery at SAGES

By Barbara Drosey

Recognizing the importance of providing context for the rapid advances in whole exome and genome sequencing, data collection, and biological information, Marcella Devoto, PhD, developed the Symposium on Advances in Genomics, Epidemiology, and Statistics (SAGES) with colleagues at the University of Pennsylvania, Princeton University, Johns Hopkins University, Columbia University, and the National Human Genome Research Institute to create a space for colleagues to address these analytical challenges.

“One of its major strengths is the span of research in the areas of genomics that SAGES covers, from statistical methods to clinical applications,” said Dr. Devoto, a researcher in the division of Human Genetics at CHOP and professor of Pediatric, Genetics, and Epidemiology at Penn.

Now in its seventh year, SAGES brings theoretical and applied scientists together from across the United States to share their work utilizing analytical methods for genomics and other high-dimensional data research fields. The symposium continues to grow with the ongoing support of Children’s Hospital of Philadelphia and the Center for Clinical Epidemiology and Biostatistics at the Penn Perelman School of Medicine, as well as funding from the National Institutes of Health.

More than 30 posters filled the concourse of the Smilow Center for Translational Research, where colleagues gathered June 7 for the start of this year’s SAGES. While the symposium was initially comprised only of invited presentations, poster sessions and oral presentations selected from student and postdoc abstracts have become a more prominent part of the event. 

“One of the goals of SAGES has always been to increase trainees’ attendance and active participation, which we hope to encourage with these activities,” Dr. Devoto said.

Genomic Diversity

The symposium’s presenters placed a special emphasis on health disparities, including work on geographical distribution of asthma and on genomic studies in minority populations, who are typically underrepresented.

Rasika Mathias, ScD, associate professor of Medicine at Johns Hopkins School of Medicine, focused on pushing boundaries to maximize data utilization from the National Human Genome Research Institute-European Bioinformatics Institute Catalog of published genome-wide association studies (GWAS) and the National Heart, Lung, and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed). Comprising more than 80 studies and more than 100 investigators, TOPMed is a publicly available dataset that includes thousands of phenotypes with good diversity selection in terms of race and ethnicity.

“A wealth of data is available that helps us think in a nonlinear fashion to get power out of small numbers,” she said.

A principal investigator for GeneSTAR, Dr. Mathias described her framework with the idea of going deep. She investigates platelets and their precursors using transcriptomics, RNA-Seq data, expression quantitative trait loci (eQTL) and protein QTL (pQTL), and GWAS findings.

Examining cohorts and demographics using data from the Framingham Heart Study, Old Order Amish database, and GeneSTAR, she and colleagues found 104 variants associated with platelet aggregation in response to adenosine diphosphate, epinephrine, and collagen.

“Trait loci (TL) analysis in TOPMed will be single largest dataset to investigate genetics of TL in the most diverse sample available,” Dr. Mathias said.

Her takeaway messages were the opportunity to understand the genetics of TL; the discovery of multiple novel loci with strong biological plausibility; and the new opportunity to examine TL-phenotype association for heart, lung, and blood disorders.

EHR for Asthma Insight

Bianca Himes, PhD, assistant professor of Informatics at Penn, shared methods for using various existing data sources such as electronic health records (EHRs) to examine asthma prevalence and treatment. 

While is it known that treatment according to clinical guidelines will control symptoms, asthma management is a multifactorial problem dependent upon medications, environmental exposure, social factor, genomics, transcipriptiomics, and epigenomics. Ethnicity and sex also play a role: Individuals of African American and Puerto Rican descent have higher rates of asthmas, as do women. Geographical location also has a marked effect on prevalence and symptom control.

“Philadelphia, like other large cities, has a higher prevalence [of asthma] than the national average,” Dr. Himes said. 

Among myriad benefits of using EHRs in research are that these records provide low-cost access to longitudinal information of many patients representing real-life populations; facilitate contact of individuals for research studies; enable improvement of clinical workflows at the point of care; use derived data for primary research; and are essential for the creation of large biobanks for omics studies.

Dr. Himes detailed her EHR-derived study of asthma exacerbations in Philadelphia County utilizing data from 2014-2016 of 1,568 patients with at least one outpatient encounter, a prescription for albuterol, and a primary ICD9/10 code for asthma. Her analysis determined a prevalence of asthma exacerbation within specific residential areas. 

The discovery of asthma hotspots created more questions: Were increased asthma diagnoses and symptom exacerbation related to genetics, environment, education and health literacy, socioeconomic factors, or all of the above?

“Short of wearing a personal monitoring device to measure exposure and fine particulate matter – which isn’t realistic, as the available device is just too big for a patient to reasonably wear – we have to turn to available data,” Dr. Himes said.

Linking EHR data and pollution studies using resources such as the American Community Survey, University of Vermont Spatial Analysis Lab that assesses tree cover, Annual Average Daily Traffic measurements, air pollution data available from the Environmental Protection Agency, as well as integrating Penn Medicine BioBank data into EHR-based studies can help to determine the contribution of genetics versus other factors. 

Other invited speakers throughout the day included Nicola Camp, PhD, Huntsman Cancer Institute; Peter Kraft, PhD, Harvard University; Braxton Mitchell, PhD, University of Maryland; Katie Pollard, PhD, Gladstone Institutes and University of California, San Francisco; and Justin Silverman, MD-PhD candidate at Duke University.

Winners at SAGES

Additional highlights included the mid-afternoon poster awards presentation. Congratulations to Stephen Cristiano, Johns Hopkins School of Medicine; Kanika Kanchan, Johns Hopkins University; David Lee, MD-PhD candidate, Computational Biology and Genomics at Penn; Ahmed Moustafa, BPharm, PhD, division of Pediatric Infectious Diseases at CHOP; Alessandro Testori, MD, PhD, postdoctoral fellow, Human Genetics and Molecular Biology at CHOP; and Chi-Yun Wu, PhD candidate, Genomic and Computational Biology at Penn.

Save the date for the next SAGES, June 5, 2020.