ROBOKOP technology offers faster, easier exploration of emerging COVID-19 research

As scientists around the world urgently work to understand the best ways to diagnose and treat COVID-19, quick and easy access to the latest research findings and rapid exploration of emerging data have become critical. RENCI scientists have developed new tools and approaches that can help researchers make important discoveries and answer key questions about COVID-19 in record time.

“These new approaches allow scientists to blend together novel observations and information from recent papers with previously known information that can be used to inform, contextualize, and test new COVID-19 information,” said Chris Bizon, director of analytics and data science at RENCI.

Finding associations in the literature

Literature co-occurrence databases help automate knowledge gathering by reveling meaningful insights based on patterns and strengths of links between keywords in research papers. RENCI researchers used the Semantic Scholar Open Research Dataset (CORD 19)—a set of research papers covering COVID-19 and earlier coronaviruses—to create literature co-occurrence databases for COVID-19.

“Because the literature on COVID-19 is so new, the structured data we would prefer to use does not yet exist,” said Bizon. “We, therefore, used text mining to extract less precise associations that can be integrated into previously well-known data.”

Bizon and colleagues began by applying Scigraph natural language processing to reveal biomedical entities such as symptoms and drugs in the CORD 19 literature and then determined which entities were frequently mentioned together. For example, if the COVID-19 spike protein often appeared in the same sentence as a certain chemical, the database would indicate a possible relationship between the two. They also performed this analysis using biomedical entities found by SciBite, a company offering semantic analytics software.   The resulting co-occurrence information forms a graph of entities that can be used directly or integrated into other tools that aid in rapid data exploration, such as RENCI’s Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP) question-answering system.

Bringing information together

ROBOKOP is a biomedical question-answering system based on a knowledge graph, meaning that it expresses data as a collection of nodes—such as genes and diseases—and edges that represent the relationships between the nodes. ROBOKOP uses multiple open biomedical databases to explore links between various biomedical data types. For example, it can be used to examine connections between a disease and a drug and then explore which genes might be involved in that relationship.

“Because of their structure, knowledge graphs are excellent at bringing together heterogeneous information into a single system so that it can be more easily explored,” said Bizon.  “Two previously unconnected pieces of information will sometimes produce an ‘aha’ moment or unexpected discovery that wouldn’t be obvious otherwise.”

RENCI scientists have created a special COVID instance of ROBOKOP called COVID-KOP by integrating the system’s original knowledge graph with the CORD-19 literature co-occurrence graphs they developed, viral protein functions, and hand-curated symptom information. This combination gives scientists access to a vast amount of information about human genes, chemicals, and respiratory diseases and allows them to ask important questions about how that information relates to COVID-19. For example, researchers can use COVID-KOP to identify drugs that might be repurposed as a COVID-19 treatment or to uncover biochemical pathways that might be targeted with newly developed drugs.  

“This work is based on our participation in the NIH NCATS Data Translator project,” said Bizon. “The tools we developed as part of that consortium provided us with the capability to rapidly integrate data and produce a valuable data resource.”

The Scigraph and SciBite co-occurrence graphs are available at automat.renci.org.

The new COVID-19 version of ROBOKOP is available at covidkop.renci.org.

By Nancy Lamontagne, Senior Science Writer, Creative Science Writing

Tagged , , |

New digital laboratory helps get COVID-19 analyses up and running quickly

Data analysis and visualization are helping answer a variety of questions about COVID-19 such as who is most at risk, how is the disease spreading, and what approaches might work best for treatments. However, setting up a computer environment to analyze the large amounts of data needed to answer such questions is no easy task. It requires selecting data libraries, software, and hardware and estimating how much memory and computing power will be needed. This process is time consuming and few individuals have the complex skill set needed to accomplish it.

RENCI scientists have developed a new digital data science laboratory called Blackbalsam that can help significantly shorten the planning stage for these efforts with a standardized environment housing computational and data sets for COVID-19 analytics.  

“As COVID-19 progressed, I saw that researchers were conducting analyses and visualization on an increasingly varied set of COVID-19 data,” said Blackbalsam co-author Steven Cox, assistant director of software systems architecture at RENCI. “I realized that it would be very helpful to have an environment that overcomes well-known technological and skill barriers by providing an interface that researchers with statistical, analytical, and visualization skills could use.”

Read more…
Tagged , |

Professor learns new lessons while teaching during a pandemic

When UNC students left for spring break on March 9, the COVID-19 public health crisis was just heating up. Soon after, UNC administrators made the decision to move to remote teaching and extended the break by a week to give instructors time to prepare. RENCI Deputy Director Ashok Krishnamurthy was one of many UNC professors who made the quick transition to teaching via video conferencing on Zoom.

What course were you teaching when you received notice that classes would all be moved online?

I was teaching a computer science course called Introduction to Scientific Programming that is designed for non-computer science majors. Most of the students take the class to learn programming skills for their day-to-day work or research. My section of the course had about 160 students enrolled.

How easily were you able to convert this class to a virtual format?

Fortunately, the course was relatively easy to adapt to virtual teaching. The UNC Computer Science department, and my colleague John Majikes who was teaching another section of the same course, have set up this course in such a way that taking it online was quite straightforward.

Read more…
Tagged , |

Beyond data: Supporting community during a pandemic

Families showing off their new face masks, donated by Sarah Davis.

When COVID-19 cases began to appear across the country, many RENCI employees felt a call to action. While several took it upon themselves to develop new data science technologies or to adapt existing ones to process COVID-19 data, others have contributed to communities in need by creating face masks, assisting food banks, connecting researchers to projects, and supporting foster youth.

Creating Face Masks

Like many across the nation, some RENCI employees have started sewing face masks to donate to medical workers, neighbors, and people in need.

Read more…
Tagged , |

FABRIC and iRODS events shift to virtual venues

Due to the current COVID-19 situation, many previously scheduled in-person events are moving to virtual spaces. Events associated with RENCI projects and with consortium and partner institutions are making decisions daily about whether to postpone, transition to virtual, or potentially proceed in the late summer and fall. We will keep our events calendar updated, so check back regularly for announcements. 

Two major events that have made the choice to transition to virtual are the FABRIC Community Workshop and the iRODS User Group Meeting. Although this change is unprecedented, both teams are adjusting their sessions to accommodate the virtual atmosphere and provide a memorable experience for attendees.

Read more…
Tagged , |

From the Director: RENCI responds to the COVID-19 crisis

The past few weeks have presented unique challenges to how we work, how we enjoy ourselves, and how we live our everyday lives. We are all worried about the uncertainties and about how this will affect us and our loved ones in the coming weeks. 

That being said, I am proud to work at RENCI and UNC-Chapel Hill, and blessed to be confronting the current challenges in a region where we are so fortunate to have skilled personnel and resources to bring to bear.

We can gather data and compute. We can volunteer. We can serve. We can encourage each other. We can broadcast. With relatively limited risk through working remotely, we can use our brains, our team spirit, our good will, our tools, and our machines to do science and serve.

Read more…

A Radical New Tack: True Collaboration

Data Commons Pilot Phase teams plan how a rising tide of data and tools can float all research boats

Last November the National Institutes of Health announced $9 million in pilot funding to explore feasibility and best practices for a new approach to advancing biomedical research. The initiative, known as Data Commons, is focused on making digital objects—that is, the data, models, and analytical tools that constitute the engine behind the modern research enterprise—available through collaborative platforms.

Read more…
Tagged , , |

RENCI Provides Insight on Data Science in Courtrooms

Stan Ahalt, Director, and Sarah Davis, Research Project Manager, attended the Science in the Courtroom Seminar for Resource Judges, held August 29-31, 2018, at the U.S. Court of Appeals for the Federal Circuit in Washington, DC. The seminar – organized by Franklin Zweig, Esq., of the National Courts and Sciences Institute and Dr. James Evans of the UNC Department of Genetics and Bryson Center for Judicial Science Education – is part of an ongoing science training program for state and federal judges from around the country, educating the judges to become resources on scientific issues for judges in their jurisdictions.

Read more…

RENCI participates in NSF Cyber Carpentry workshop to prepare early-career researchers

Teacher and students discuss an issue with their team project.
From left: Andres Espindola-Camacho from Oklahoma State University, Jeremy Thorpe from Johns Hopkins University School of Medicine, Gaurav Kandoi from Iowa State University, and Yingru Xu from Duke University discuss an issue with their team project.

Big data is only getting bigger, and that can cause big problems for researchers who need to store and share their data. Twenty doctoral students and post-doctoral associates from across the county learned the tools and techniques to solve these problems at the inaugural Cyber Carpentry Workshop at the University of North Carolina at Chapel Hill. Sponsored by the National Science Foundation (NSF) and hosted by the UNC School of Information and Library Science (SILS), the two-week workshop in late July introduced students to a variety of applications, platforms, and processes for data life-cycle management and data-intensive computation. The Renaissance Computing Institute (RENCI) provided support for the workshop in the form of instructors and project management staff.

Read more…
Tagged , , , , , , , |

Strategies for hiring and maintaining a diverse data scientists workforce

RTI’s Kristina Brunelle (left) moderates a panel discussion with Amy Roussel, RTI (center); Gracie Johnson-Lopez, Diversity and HR Solutions (right); and Sackeena Gordon-Jones, Transformation Edge and NC State University (on screen).

Data science is hot. That’s good news for workers with data science skills. It also means organizations competing to hire data scientists need to understand how to recruit talent that will solve their data science challenges and contribute to creating a productive and diverse workforce.  Read more…

Tagged , , , |