EduHeLx: A Cloud-based Programming Platform for Data Science Education

The EduHeLx pilot experiment informed future thinking about incorporating cloud-based technologies in UNC-CH courses, including courses in the new UNC-CH School of Data Science & Society (SDSS)

EduHeLx is an education-focused instance of HeLx, a scalable cloud-based computing platform developed by researchers at the Renaissance Computing Institute (RENCI), a data science research institute at UNC-Chapel Hill. HeLx offers a suite of tools, capabilities, and workspaces enabling research communities to deploy custom data science workspaces securely in the cloud. 

EduHeLx was developed to address the needs of courses with programming components and currently supports programming using Python and R. Previously, students were required to download a course’s programming software onto their own computers, and instructors had to work one-on-one with students to troubleshoot issues throughout the semester; this was so time-consuming that it took away from teaching time and derailed course schedules, especially in computer science courses with 250+ students. With EduHeLx, infrastructure setup is not required by instructors or students–students can access a course’s programming software in the cloud without the need to download it, thus saving a significant amount of class time. 

Emphasizing EduHeLx’s benefits, Ashok Krishnamurthy, Interim Director at RENCI and professor of Computer Science at UNC-Chapel Hill, stated, “We could concentrate on the instructional material for the course rather than spending time debugging installations on student’s laptops or other technology problems that unexpectedly crop up during the semester.” Additionally, EduHeLx allows instructors to send all course material through the platform, as well as enable auto-grading for exams and assignments, another time-saving capability that was not previously possible.

As a pilot experiment, UNC Information & Technology Services (ITS) assisted RENCI in applying EduHeLx as the educational platform in the UNC-Chapel Hill Computer Science course, COMP 116: Introduction to Scientific Programming, in Fall 2021 (Stan Ahalt/Ashok Krishnamurthy) and Spring 2022 (John Majikes). ITS provided technical support to deploy EduHeLx on UNC’s Google Cloud and assisted with adding 250+ student accounts; further, ITS provided financial support for the cloud costs to deploy EduHeLx and helped ensure security of the platform. RENCI and ITS both learned a great deal from this experiment, and it aided in informing ITS’ future engagement with cloud-based learning solutions. 

ITS, which manages the University’s Google Cloud Platform (GCP) environment, set up monitoring and essential guardrails to protect University data and advised RENCI on best practices for efficiently managing the resources, said Chuck Crews, Manager of ITS Cloud Operations Group, and John Godehn, ITS Systems Programmer/Specialist.  

“One of the compelling reasons to deploy in the cloud is that you only pay for what you’re using,” instead of paying for resources to sit idle, Crews said. Working in the cloud allows for resources to be deployed, and undeployed, as needed. 

Given the innovative capabilities EduHeLx enables for data science education, the newly launched UNC-Chapel Hill School of Data Science & Society (SDSS) is considering making extensive use of EduHeLx for a range of courses. Dr. Stan Ahalt, Inaugural Dean of the SDSS, reported that the School hopes to use the platform as a mechanism to provide data and computation to students very early in the program, both in existing courses cross-listed with other departments and in new courses developed by the SDSS. Further elaborating on the novel utility of EduHeLx, Ahalt stated, “The ability to stand up an educational platform and reliably provision the data and computation through a relatively simple process will enable us to engage new students seamlessly, as well as provide a tool that will grow with them as they progress in their coursework and research.” 

One of the main focuses of the SDSS is preparing students for an evolving workforce that increasingly demands data science literacy, which necessitates an interdisciplinary approach to integrate data science programming into a wide range of courses, including courses in the humanities and social sciences. Additionally, the SDSS places significant emphasis on their last ‘s’–society; by introducing data science and its applications to students with diverse disciplinary interests, the SDSS can better prepare them to effectively apply data science in their career of choice and maximize their impact on society. With its unique, accessible, and adaptable capabilities, EduHeLx has the potential to serve as a key resource to transform the SDSS’ vision into reality. 

New concept poised to accelerate drug discovery through data mining

RENCI scientists together with collaborators from UNC and other institutions have developed and defined a concept called Clinical Outcome Pathways (COPs) that could help scientists harness the vast amounts of clinical and biomedical data available today to accelerate drug discovery and drug repurposing.

“Improving drug discovery requires understanding all the biological processes involved in how drugs work,” said the paper’s first author Daniel Korn from the UNC-Chapel Hill Department of Computer Science. “COPs help broaden the concept of a drug’s mechanism of action so that knowledge graph mining can be used to discover the complete chain of events that enables a specific therapeutic effect for a drug.”

Knowledge graphs express data as a collection of nodes—such as drugs and diseases—with edges that represent the relationships—such as drug A treats disease B—between the nodes. By bringing together heterogeneous information into a single system, knowledge graphs can reveal relationships between previously unconnected information that wouldn’t be obvious otherwise.

“The real power of the COPs concept is that once we understand all the biological pathways connecting drugs and diseases, that information can be used to develop new therapeutic agents—or repurpose existing ones—that modulate the same biological pathway,” explained the paper’s senior author Alexander Tropsha from the UNC Eshelman School of Pharmacy.

As described in a Drug Discovery Today paper, the researchers define COPs as a chain of key events—molecular initiating event, intermediate event(s), and the clinical outcome—that are responsible for the therapeutic actions of a drug. Each element of the chain corresponds to a term defined in commonly used biomedical ontologies, which allows computational methods to be used to elucidate COPs and provides a way for them to be cataloged for future use.

Read more…

RENCI’s Advanced Cyberinfrastructure Support Team introduces updated research resources

The Advanced Cyberinfrastructure Support (ACIS) team at RENCI works to provide efficient, available resources for our researchers. Over the last several months, the team has introduced several new capabilities and tools that support researchers in successfully producing results from their computing research.

Read more…

Use cases show Translator’s potential to expedite clinical research

RENCI investigators are contributing to the development of a platform called Biomedical Data Translator that will allow researchers to easily access and interrelate large amounts of data relevant to advancing biomedical research. Funded by the NIH’s National Center for Advancing Translational Sciences (NCATS), the new system is poised to accelerate translational clinical research by allowing users to approach biomedical questions from a holistic perspective to inspire important new research directions.

The platform is being developed by a 15-team multi-institutional Biomedical Data Translator consortium. Three of these teams include leadership from RENCI investigators. Although still a work in progress, Translator is being designed as an easy-to-use tool that can quickly respond to queries by identifying and synthesizing relevant data from a wide variety of sources.

Read more…

New streamlined statistical method provides improved pattern detection and risk prediction for disease

The novel regression algorithm, CALF, outperforms the current gold standard, LASSO, in statistical tests

Researchers from the Renaissance Computing Institute (RENCI) at UNC-Chapel Hill, Perspectrix, the UNC School of Medicine, and the WVU Rockefeller Neuroscience Institute have collaborated to develop a new method for finding patterns in data which verifiably surpasses the performance of a generally accepted “gold standard.” 

Attempting to find patterns in data is central to all research, and it is particularly important in medical use of biological samples to predict a patient’s risk for disease formation and progression. Today, researchers can utilize advanced technology to produce an ocean of data about one person from various biological samples such as blood, DNA, and saliva, with the goal of identifying particular markers that can be informative about a person’s current health and future outlook. However, this advanced data collection and processing has outpaced current statistical methods for identifying simple but robust patterns and relationships, and this is particularly true for the field of psychiatry. For instance, researchers have yet to fully understand and predict the progression of schizophrenia. 

This new method, CALF, which stands for “coarse approximation linear function,” is described in the Scientific Reports paper, “A greedy regression algorithm with coarse weights offers novel advantages,” published on March 31, 2022. Application of CALF to five quite different examples from psychiatric and neurological studies consistently outperformed the gold standard, LASSO, or “least absolute shrinkage and selection operator” regression, and other methods. 

Read more…

New data format aids large-scale evolutionary biology research

In addition to revealing the hidden histories of life on Earth, studying the evolutionary relationships between organisms can help scientists track emerging diseases, inform methods to control invasive species, and understand how to best protect at-risk ecosystems.  

DNA sequencing and other genetic analysis approaches are providing vast new data streams to enable this research at unprecedented scales. For example, the Open Tree of Life Project is attempting to create a synthesized view of the evolutionary relationships among every known organism – more than 1.7 million species.

To aid in these endeavors, Gaurav Vaidya, PhD, from RENCI collaborated with a multi-institutional team of researchers to create a new data format that makes the clade definitions used by evolutionary biologists readable and interpretable by computers. Clades, which capture an organism’s ancestor and all its descendants, make up a portion of a phylogeny, a set of evolutionary relationships between different organisms.

Read more…

Biomedical Data Translator Platform moves to the next phase

Although we now have huge amounts of data on everything from genes to the causes of disease, it is stored in an enormous variety of ways and in many different locations. This makes it difficult, if not impossible, to find and use this data to think about biomedical questions in a big picture, holistic way.

The NIH’s National Center for Advancing Translational Sciences (NCATS) Biomedical Data Translator program is working to change this by funding a platform that allows scientists to easily access and interrelate data to inform new research directions. RENCI investigators are part of the leadership for three of the 15 teams that make up the Biomedical Data Translator consortium.

The Translator platform is designed to accelerate the development of new treatments and translational clinical research. For example, it could help uncover potential new therapies and drug targets, further elucidate how environmental exposures impact disease, and reveal new relationships between rare and common diseases.

“Translator offers a way of looking at a large amount of information – the equivalent to reading all the research papers ever published – and returning a reasonable amount of information,” said RENCI’s Chris Bizon, co-PI of the Translator standards and reference implementation team. “It provides a hypothesis that can be investigated and a list of information that will be helpful to this investigation.”

Read more…
Tagged , |

Drone projects take data processing and communication to new heights

Communicating after a natural disaster is often critical but can be challenging if telecommunications lines are damaged or wireless networks become overwhelmed. Drones, however, can be used to quickly create an on-demand communication infrastructure that is not only useful for emergency situations but can also be used for transportation, surveillance and crop monitoring. 

RENCI researchers are contributing to cutting-edge research projects that aim to make drones even more useful by improving how their data is handled and by providing a testbed that helps researchers optimize drone-based communication. 

Read more…
Tagged , , |

RENCI researchers awarded 2021 Best Paper from the Elsevier FGCS Journal

RENCI researchers recently received the 2021 Best Paper Award from the Elsevier Future Generation Computer Systems (FGCS) Journal. The paper, titled “End-to-end online performance data capture and analysis for scientific workflows,” was co-authored by Cong Wang, Anirban Mandal, and collaborators from the DOE Panorama and RAMSES projects.

The FGCS Journal aims to lead the way in advances in distributed systems, collaborative environments, high performance computing (HPC), and big data on such infrastructures as grids, clouds, and the Internet of Things. Each year, the editorial board awards “Best Paper” to one submission featured in the journal.

Read more…

STAR Program: Investing in the Next Generation of Leaders

As part of RENCI’s mission to be a leader in data science, our team is dedicated to helping the next generation of thinkers bring their ideas to the table, build valuable skill sets, and pursue professional growth. While we’ve hosted students in several areas of our work in the past, we have recently launched the Student Advancement at RENCI (STAR) Program to provide organization-wide support and resources. We are excited to expand our reach and engage with curious and hard-working young professionals across RENCI’s research groups, collaborations, and operations teams. 

“Working as an intern at RENCI has been a meaningful experience to me,” said Yifei Wang, Atlantic Wave-SDX research assistant and intern. “Colleagues and supervisors were super patient and helpful while helping me to grow from a student to a professional. RENCI is the perfect place if you want to pursue your academic and career goals.” 

Read more…