A new system designed by CHOP informatics experts seeks to help investigators assemble, access and analyze large amounts of health data — without requiring researchers to become specialized database technicians themselves. Designed by the Center for Biomedical Informatics (CBMi), the Harvest toolkit is an open-source, highly interactive framework that is designed to let users to quickly navigate different types and levels of data.
“We want to help researchers explore their data, not their database,” said Byron Ruth, lead developer of Harvest at CBMi. Ruth is one of the co-authors of a paper published recently in the Journal of the American Medical Informatics Association that introduced CBMi’s Harvest framework.
“Research institutions typically work through their information technology staff to provide a single data warehouse that may be too general-purpose for all its projects, or develop one-off solutions on a case-by-case basis for each project,” co-author Michael J. Italia added. “Our approach in Harvest is different,” said Italia, CBMi’s manager of Applications Research. “We decided to focus on end-users, generalizing the toolkit for application to any biomedical study with multiple collaborators, but also allowing individual software developers and data managers to customize the software for specific projects.”
Harvest offers users the ability to maneuver smoothly among various levels of data, from individual patient records to aggregated reports of all patients in a database. Users can construct queries to slice and dice data — grouping subjects, for instance, by age or ethnicity, calling up individual blood test results or MRIs, or including or excluding specific diagnoses. And an advantage of Harvest is that it provides transparency and visibility to data in a manner that is familiar to researchers who are invested in a particular disease or project.
The developers say their tool reflects the growing complexity of research in the Big Data era of electronic health records and genomic technology. In the 1980s and 1990s, much federal research followed a hypothesis-driven model, focusing on predefined measurements within a patient population. Now, many current databases collect vast amounts of many data types with fewer preconceived notions of what is significant.
The CBMi team originally designed Harvest to manage data from AudGenDB, an audiology database funded by the National Institute on Deafness and Other Communications Disorders. In the current paper, the study team evaluated Harvest by applying it to data from two other collections, and CBMi staff are currently working to apply Harvest to additional sources.