ChatGPT used to streamline medical record analysis in EduHeLx

The EduHeLx team at the Renaissance Computing Institute demonstrated time- and cost-saving capabilities of ChatGPT in an educational use case for a UNC-Chapel Hill clinical data science course.

In the past few months, ChatGPT has risen from relative obscurity to a newsworthy technology for its revolutionary artificial intelligence (AI) capabilities. The natural language processing chatbot was developed by OpenAI and is built on top of families of large language models. This approach enables ChatGPT to return related search results by reasoning over interconnected knowledge networks across these language models, rendering it the most advanced AI chatbot to date. ChatGPT’s innovative AI capabilities have significant time- and cost-saving implications in many instances, including those in the educational field, which was recently demonstrated by the EduHeLx team at the Renaissance Computing Institute (RENCI), a data science research institute at UNC-Chapel Hill. 

EduHeLx was used in the Spring 2023 UNC-Chapel Hill course, CHIP690: Foundations of Clinical Data Science, which introduces students to hands-on Electronic Health Record analysis training. The platform helps students understand how effectively using this data can advance clinical research and improve patient outcomes. The class leveraged realistic, but synthetic, patient data downloaded as CSV files, which must be imported into a database (here, PostgreSQL) before they can be used for analysis. A straightforward but important step is that one must first create the table definitions (also known as the schema) that will store the data, after which it is a relatively easy process to import them. Although a straightforward process, it is time-consuming, tedious, and prone to missing subtle details. Jeff Waller, one of the EduHeLx developers who worked on this issue, stated, “Complicating matters more, there was also a time constraint and a rather large number of table definitions that needed to be created (34). Combined, this would easily account for hours worth of work.”

Given the time constraints and large number of files, the EduHeLx team turned to ChatGPT to automate the process. With just 20 lines of code, ChatGPT generated database schema definitions from the CSV files, as well as the “import statements” needed to import the contents of the CSV files into the database. The entire process took roughly 45 minutes, with the total cost amounting to only 20 cents. The team used the resulting data import statements to construct the database and fill it with data, and the students were then given access to the data via database login. Not only did ChatGPT expedite an otherwise tedious and time-consuming process for this course, but this solution is general enough to be reusable for future courses where it is necessary to create database schema definitions and import statements from CSV files for use in EduHeLx. 

This use case demonstrates the utility of both ChatGPT and EduHeLx, as both proved essential to students’ success in their hands-on analysis training. In addition to CHIP690, EduHeLx has been successfully deployed in the UNC-Chapel Hill course, COMP116: Introduction to Scientific Programming, in Fall 2021 and Spring 2022. Given its unique cloud-based programming capabilities, EduHeLx has the potential to serve as an essential resource for many other courses, particularly those developed and cross-listed by the new UNC School of Data Science and Society (SDSS). 

Looking ahead, the EduHeLx team plans to continue optimizing the platform. Future plans include incorporating Otter-Grader, a tool developed by the University of California, Berkeley that provides auto-grading capabilities and real-time error and efficiency feedback to students. This will further enhance EduHeLx’s utility in programming-based courses, thus enhancing instructors’ and students’ teaching and learning experiences.

EduHeLx is looking for pilot instructors interested in using the platform in their data science courses. Reach out to helx@lists.renci.org if interested. 

EduHeLx is an education-focused instance of HeLx, a scalable cloud-based computing platform developed by researchers at RENCI. HeLx offers a suite of tools, capabilities, and workspaces, enabling research communities to deploy custom data science workspaces securely in the cloud. EduHeLx was developed to address the needs of courses with programming components and currently supports programming using Python and R. For more information, see an earlier blog post about EduHeLx here.