Data analysis and visualization are helping answer a variety of questions about COVID-19 such as who is most at risk, how is the disease spreading, and what approaches might work best for treatments. However, setting up a computer environment to analyze the large amounts of data needed to answer such questions is no easy task. It requires selecting data libraries, software, and hardware and estimating how much memory and computing power will be needed. This process is time consuming and few individuals have the complex skill set needed to accomplish it.
RENCI scientists have developed a new digital data science laboratory called Blackbalsam that can help significantly shorten the planning stage for these efforts with a standardized environment housing computational and data sets for COVID-19 analytics.
“As COVID-19 progressed, I saw that researchers were conducting analyses and visualization on an increasingly varied set of COVID-19 data,” said Blackbalsam co-author Steven Cox, assistant director of software systems architecture at RENCI. “I realized that it would be very helpful to have an environment that overcomes well-known technological and skill barriers by providing an interface that researchers with statistical, analytical, and visualization skills could use.”
The best tools, all in one place
Blackbalsam eliminates the need for each user to assemble all the required computational and data instruments from scratch by bringing together the best and newest technology for cluster computing, artificial intelligence, and visualization in a cloud-ready and open-source environment.
The creation of Blackbalsam draws on RENCI’s previous experience in bringing analytical tools to data scientists. “We’ve worked with communities at the National Science Foundation and National Institutes of Health in the areas of imaging, artificial intelligence, and knowledge graphs,” said Cox. “This has given us a wide and varied exposure to the kinds of tools scientists use and also the kinds of challenges that they face.”
Rather than selecting a specific tool to perform each function, the developers decided that the interface should provide a suite of artificial intelligence analysis capabilities and visualization environments so that users can use what works best for them. The platform also provides built-in tools for sharing and scaling up analyses.
The fact that the Blackbalsam infrastructure is cloud-ready will allow a scientist performing an unusually large analysis to move it to the cloud without changing any fundamental processes. It was also important for the tools available on Blackbalsam to be open source because using proprietary software for analytics can be expensive and can create a significant barrier for researchers to reproduce and build on each other’s work.
Air pollution and COVID-19
RENCI researchers plan to use Blackbalsam to examine COVID-19 and fine particulate matter pollution in North Carolina. “We’ve been involved in an ongoing project that has collected data on this type of air pollution in the state,” said Cox. “When we saw the research about higher mortality rates for COVID-19 in the presence of particulate matter, it created the opportunity to investigate the impact of COVID-19 in communities based on pollution levels.”
Although COVID-19 is the current focus of Blackbalsam, the design process accounts for the continuing need for unified analytics platforms that allow scientists to get started quickly. “We don’t want to be in the situation again where something happens that requires many people to quickly begin analyzing data, but we don’t have a consolidated environment ready for it,” said Cox.
Blackbalsam can be accessed at https://github.com/helxplatform/blackbalsam.
By Anne Johnson, Lead Science Writer at Creative Science Writing