ChatGPT used to streamline medical record analysis in EduHeLx

The EduHeLx team at the Renaissance Computing Institute demonstrated time- and cost-saving capabilities of ChatGPT in an educational use case for a UNC-Chapel Hill clinical data science course.

In the past few months, ChatGPT has risen from relative obscurity to a newsworthy technology for its revolutionary artificial intelligence (AI) capabilities. The natural language processing chatbot was developed by OpenAI and is built on top of families of large language models. This approach enables ChatGPT to return related search results by reasoning over interconnected knowledge networks across these language models, rendering it the most advanced AI chatbot to date. ChatGPT’s innovative AI capabilities have significant time- and cost-saving implications in many instances, including those in the educational field, which was recently demonstrated by the EduHeLx team at the Renaissance Computing Institute (RENCI), a data science research institute at UNC-Chapel Hill. 

EduHeLx was used in the Spring 2023 UNC-Chapel Hill course, CHIP690: Foundations of Clinical Data Science, which introduces students to hands-on Electronic Health Record analysis training. The platform helps students understand how effectively using this data can advance clinical research and improve patient outcomes. The class leveraged realistic, but synthetic, patient data downloaded as CSV files, which must be imported into a database (here, PostgreSQL) before they can be used for analysis. A straightforward but important step is that one must first create the table definitions (also known as the schema) that will store the data, after which it is a relatively easy process to import them. Although a straightforward process, it is time-consuming, tedious, and prone to missing subtle details. Jeff Waller, one of the EduHeLx developers who worked on this issue, stated, “Complicating matters more, there was also a time constraint and a rather large number of table definitions that needed to be created (34). Combined, this would easily account for hours worth of work.”

Given the time constraints and large number of files, the EduHeLx team turned to ChatGPT to automate the process. With just 20 lines of code, ChatGPT generated database schema definitions from the CSV files, as well as the “import statements” needed to import the contents of the CSV files into the database. The entire process took roughly 45 minutes, with the total cost amounting to only 20 cents. The team used the resulting data import statements to construct the database and fill it with data, and the students were then given access to the data via database login. Not only did ChatGPT expedite an otherwise tedious and time-consuming process for this course, but this solution is general enough to be reusable for future courses where it is necessary to create database schema definitions and import statements from CSV files for use in EduHeLx. 

This use case demonstrates the utility of both ChatGPT and EduHeLx, as both proved essential to students’ success in their hands-on analysis training. In addition to CHIP690, EduHeLx has been successfully deployed in the UNC-Chapel Hill course, COMP116: Introduction to Scientific Programming, in Fall 2021 and Spring 2022. Given its unique cloud-based programming capabilities, EduHeLx has the potential to serve as an essential resource for many other courses, particularly those developed and cross-listed by the new UNC School of Data Science and Society (SDSS). 

Looking ahead, the EduHeLx team plans to continue optimizing the platform. Future plans include incorporating Otter-Grader, a tool developed by the University of California, Berkeley that provides auto-grading capabilities and real-time error and efficiency feedback to students. This will further enhance EduHeLx’s utility in programming-based courses, thus enhancing instructors’ and students’ teaching and learning experiences.

EduHeLx is looking for pilot instructors interested in using the platform in their data science courses. Reach out to helx@lists.renci.org if interested. 

EduHeLx is an education-focused instance of HeLx, a scalable cloud-based computing platform developed by researchers at RENCI. HeLx offers a suite of tools, capabilities, and workspaces, enabling research communities to deploy custom data science workspaces securely in the cloud. EduHeLx was developed to address the needs of courses with programming components and currently supports programming using Python and R. For more information, see an earlier blog post about EduHeLx here.

NC researchers come together to harness the power of clinical and environmental health data

In an increasingly interconnected world, the integration of clinical and environmental health data holds immense potential for advancing research, improving patient outcomes, and shaping the future of healthcare. However, to truly make an impact on individuals and communities, institutional and scientific silos that hinder collaboration and resource sharing must be overcome.

Recognizing this challenge, Cavin Ward-Caviness, PhD, (US Environmental Protection Agency [US EPA]), Charles Schmitt, PhD (National Institute of Environmental Health Sciences [NIEHS]), and Karamarie Fecho, PhD, Ashok Kishnamurthy, PhD, and Sarah Tyndall (Renaissance Computing Institute [RENCI]) organized the inaugural Clinical and Environmental Health Data Workshop on Friday, May 19 at RENCI in Chapel Hill, NC.

“Pooling resources and expertise has the potential to catalyze groundbreaking research initiatives and identify previously unseen connections between environmental factors and human health outcomes,” according to Ashok Krishnamurthy, PhD, director of RENCI. “We are thrilled to be able to come together with our partners at NIEHS and the US EPA to work collaboratively on these hard – but impactful – problems.”

At the heart of this endeavor lies the ultimate goal of improving patient outcomes. By integrating clinical data, such as medical records and patient histories, with environmental data, researchers can gain deeper insights into the complex interplay between individual health and environmental factors. This holistic approach can lead to targeted interventions and personalized care plans.

The fusion of clinical and environmental health data not only benefits individual patients but also empowers communities. By leveraging integrated data, researchers and public health officials can identify environmental disparities, understand social determinants of health, and design evidence-based interventions tailored to specific communities. This knowledge equips policymakers with the tools needed to implement targeted interventions, allocate resources efficiently, and ensure the equitable distribution of healthcare services.

The half-day workshop brought together over twenty local scientists, healthcare professionals, and environmental experts from the Research Triangle Park (RTP) region to discuss the current state of the art and the work that still needs to be done to make these goals into reality.

The workshop began with several lightning talks where local leaders gave presentations on the tools, data, and methods in their research areas. Topics included:

  • Clinical Informatics: This presentation focused specifically on standardizing Electronic Health Records (EHRs), which are electronic file formats of medical records. Converting EHRs to a standardized model would allow their application for research and expand their reach beyond local and state boundaries to national, cross-institutional analysis.
  • Geospatial modeling: This presentation focused on various methods for modeling environmental exposures and subsequent population outcomes, which sparked a discussion on how additional factors, such as geography, could be included in the models and how to integrate with relevant exposure events.
  • Social and environmental determinants of health: This presentation focused on how to integrate EHRs with social and environmental data, which would provide a deeper understanding of how environmental exposure connects to health.
  • Community and public health: This presentation presented the complexities of public health issues and their solutions. An example was shown of how social determinants of health impact outcomes of environmental health hazards, and emphasis was placed on the need for team science to tackle these complex issues.
  • Public health surveillance: This presentation described a tool for surveilling public health data, The North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC DETECT). NC DETECT contains data from emergency departments, North Carolina Poison Control, and emergency medical services.
  • Data science and related tools: This presentation highlighted the NIH Strategic Plan for Data Science and NIH priorities around building a biomedical data ecosystem that supports data sharing. The NIEHS Climate, Health, and Outcomes Research Data (CHORD) project, funded by the PCORI Trust Fund, is intended to serve as an exemplar for geospatial-based climate data and tools.
  • Cyberinfrastructure and software applications: This presentation focused on the cyberinfrastructure that RENCI has been developing to support clinical and environmental health research. The emphasis was on the informed development of cyberinfrastructure designed to bridge gaps between geoscience models and their clinical and public health applications.

After lightning talks, the group divided into breakout sessions focused on two themes: identifying gaps in integrating environmental and social health data, and creating a list of shared resources that can be used to address those gaps.

During the wrap-up session, there was robust discussion on establishing a vision and cadence for future workshops. Ultimately, the group plans to hold regular workshops to establish regional leadership in clinical and environmental health research, ensuring that the needs of local communities and stakeholders remain central to future initiatives. By nurturing the partnerships forged at this and future events, North Carolina can play a vital role in shaping the future of healthcare, driving transformative change, and moving toward a healthier and more sustainable future for all.

RENCI strengthens storm surge response capabilities

APSViz provides critical, high-resolution coastal hazards information to expedite decision-making and productivity

On September 28, 2022, Hurricane Ian made landfall along the west coast of Florida as a Category 4 hurricane–the strongest Category 4 hurricane to hit the region since Hurricane Charley in 2004–causing substantial damage from strong winds and the resulting storm surge and wind waves. Hurricane Ian then crossed the Florida landmass, emerged into the Atlantic Ocean, strengthened back into a weak hurricane, and made a second landfall on the South Carolina coast. According to the National Oceanic and Atmospheric Administration (NOAA), the damage caused by Hurricane Ian in its two landfalls ranks it as the third-costliest weather disaster in U.S. history. This major event required multiple state and local agencies to prepare for significant storm impacts, assess potential damages, and plan for post-storm recovery activities. 

Over the past three years, the Renaissance Computing Institute (RENCI), a data science research institute at UNC-Chapel Hill, has been developing a state-of-the-science, cloud-ready data engine, visualization, and information delivery system called APSViz. As a core project within the Department of Homeland Security’s Coastal Resilience Center at UNC-Chapel Hill, APSViz disseminates real-time coastal hazards information and enhances research productivity by making it much easier to understand computer simulations and predictions of coastal hazards. 

Read more…

EduHeLx: A Cloud-based Programming Platform for Data Science Education

The EduHeLx pilot experiment informed future thinking about incorporating cloud-based technologies in UNC-CH courses, including courses in the new UNC-CH School of Data Science & Society (SDSS)

EduHeLx is an education-focused instance of HeLx, a scalable cloud-based computing platform developed by researchers at the Renaissance Computing Institute (RENCI), a data science research institute at UNC-Chapel Hill. HeLx offers a suite of tools, capabilities, and workspaces enabling research communities to deploy custom data science workspaces securely in the cloud. 

EduHeLx was developed to address the needs of courses with programming components and currently supports programming using Python and R. Previously, students were required to download a course’s programming software onto their own computers, and instructors had to work one-on-one with students to troubleshoot issues throughout the semester; this was so time-consuming that it took away from teaching time and derailed course schedules, especially in computer science courses with 250+ students. With EduHeLx, infrastructure setup is not required by instructors or students–students can access a course’s programming software in the cloud without the need to download it, thus saving a significant amount of class time. 

Read more…

New concept poised to accelerate drug discovery through data mining

RENCI scientists together with collaborators from UNC and other institutions have developed and defined a concept called Clinical Outcome Pathways (COPs) that could help scientists harness the vast amounts of clinical and biomedical data available today to accelerate drug discovery and drug repurposing.

“Improving drug discovery requires understanding all the biological processes involved in how drugs work,” said the paper’s first author Daniel Korn from the UNC-Chapel Hill Department of Computer Science. “COPs help broaden the concept of a drug’s mechanism of action so that knowledge graph mining can be used to discover the complete chain of events that enables a specific therapeutic effect for a drug.”

Knowledge graphs express data as a collection of nodes—such as drugs and diseases—with edges that represent the relationships—such as drug A treats disease B—between the nodes. By bringing together heterogeneous information into a single system, knowledge graphs can reveal relationships between previously unconnected information that wouldn’t be obvious otherwise.

“The real power of the COPs concept is that once we understand all the biological pathways connecting drugs and diseases, that information can be used to develop new therapeutic agents—or repurpose existing ones—that modulate the same biological pathway,” explained the paper’s senior author Alexander Tropsha from the UNC Eshelman School of Pharmacy.

As described in a Drug Discovery Today paper, the researchers define COPs as a chain of key events—molecular initiating event, intermediate event(s), and the clinical outcome—that are responsible for the therapeutic actions of a drug. Each element of the chain corresponds to a term defined in commonly used biomedical ontologies, which allows computational methods to be used to elucidate COPs and provides a way for them to be cataloged for future use.

Read more…
Tagged |

RENCI’s Advanced Cyberinfrastructure Support Team introduces updated research resources

The Advanced Cyberinfrastructure Support (ACIS) team at RENCI works to provide efficient, available resources for our researchers. Over the last several months, the team has introduced several new capabilities and tools that support researchers in successfully producing results from their computing research.

Read more…
Tagged |

Use cases show Translator’s potential to expedite clinical research

RENCI investigators are contributing to the development of a platform called Biomedical Data Translator that will allow researchers to easily access and interrelate large amounts of data relevant to advancing biomedical research. Funded by the NIH’s National Center for Advancing Translational Sciences (NCATS), the new system is poised to accelerate translational clinical research by allowing users to approach biomedical questions from a holistic perspective to inspire important new research directions.

The platform is being developed by a 15-team multi-institutional Biomedical Data Translator consortium. Three of these teams include leadership from RENCI investigators. Although still a work in progress, Translator is being designed as an easy-to-use tool that can quickly respond to queries by identifying and synthesizing relevant data from a wide variety of sources.

Read more…

New streamlined statistical method provides improved pattern detection and risk prediction for disease

The novel regression algorithm, CALF, outperforms the current gold standard, LASSO, in statistical tests

Researchers from the Renaissance Computing Institute (RENCI) at UNC-Chapel Hill, Perspectrix, the UNC School of Medicine, and the WVU Rockefeller Neuroscience Institute have collaborated to develop a new method for finding patterns in data which verifiably surpasses the performance of a generally accepted “gold standard.” 

Attempting to find patterns in data is central to all research, and it is particularly important in medical use of biological samples to predict a patient’s risk for disease formation and progression. Today, researchers can utilize advanced technology to produce an ocean of data about one person from various biological samples such as blood, DNA, and saliva, with the goal of identifying particular markers that can be informative about a person’s current health and future outlook. However, this advanced data collection and processing has outpaced current statistical methods for identifying simple but robust patterns and relationships, and this is particularly true for the field of psychiatry. For instance, researchers have yet to fully understand and predict the progression of schizophrenia. 

This new method, CALF, which stands for “coarse approximation linear function,” is described in the Scientific Reports paper, “A greedy regression algorithm with coarse weights offers novel advantages,” published on March 31, 2022. Application of CALF to five quite different examples from psychiatric and neurological studies consistently outperformed the gold standard, LASSO, or “least absolute shrinkage and selection operator” regression, and other methods. 

Read more…

New data format aids large-scale evolutionary biology research

In addition to revealing the hidden histories of life on Earth, studying the evolutionary relationships between organisms can help scientists track emerging diseases, inform methods to control invasive species, and understand how to best protect at-risk ecosystems.  

DNA sequencing and other genetic analysis approaches are providing vast new data streams to enable this research at unprecedented scales. For example, the Open Tree of Life Project is attempting to create a synthesized view of the evolutionary relationships among every known organism – more than 1.7 million species.

To aid in these endeavors, Gaurav Vaidya, PhD, from RENCI collaborated with a multi-institutional team of researchers to create a new data format that makes the clade definitions used by evolutionary biologists readable and interpretable by computers. Clades, which capture an organism’s ancestor and all its descendants, make up a portion of a phylogeny, a set of evolutionary relationships between different organisms.

Read more…

Biomedical Data Translator Platform moves to the next phase

Although we now have huge amounts of data on everything from genes to the causes of disease, it is stored in an enormous variety of ways and in many different locations. This makes it difficult, if not impossible, to find and use this data to think about biomedical questions in a big picture, holistic way.

The NIH’s National Center for Advancing Translational Sciences (NCATS) Biomedical Data Translator program is working to change this by funding a platform that allows scientists to easily access and interrelate data to inform new research directions. RENCI investigators are part of the leadership for three of the 15 teams that make up the Biomedical Data Translator consortium.

The Translator platform is designed to accelerate the development of new treatments and translational clinical research. For example, it could help uncover potential new therapies and drug targets, further elucidate how environmental exposures impact disease, and reveal new relationships between rare and common diseases.

“Translator offers a way of looking at a large amount of information – the equivalent to reading all the research papers ever published – and returning a reasonable amount of information,” said RENCI’s Chris Bizon, co-PI of the Translator standards and reference implementation team. “It provides a hypothesis that can be investigated and a list of information that will be helpful to this investigation.”

Read more…
Tagged , |