DataBridge shines a light on dark data

Chapel Hill, NC – Even in the age of big data, most research data is created by small research teams or individual investigators. These researchers collect their data, analyze it, and usually store it on a local hard drive or network where it is impossible for future researchers to access it.

Individually, these data sets are small, but in the aggregate, they too can be defined as big data. In science, they are referred to as “dark data,” an untapped treasure trove of information that other researchers are unable to discover and use.

How can scientists find and use this vast pool of data, repurpose it for new research questions, and use it to spark new insights? DataBridge, a project led by the Renaissance Computing Institute (RENCI) at UNC-Chapel Hill, aims to make hidden dark data discoverable and available for investigation and collaboration.

DataBridge gathers metadata about data sets, including the scientific field of the data, when and where it was created or collected, and methods used. It then uses relevance detection algorithms to find similarities between a newly ingested data set and other data sets in the system. The system clusters data sets into “communities” based on their similarities. When researchers use the DataBridge web interface, they can find similar and related data sets—much like Amazon.com recommends books based on past purchases or Facebook recommends new friends based on existing connections.

Finding dark data could have tremendous benefits for science. The lifespan of research data would extend well beyond a specific project, enabling it to contribute to scientific knowledge indefinitely. Researchers would be able to find previously undiscoverable data sets, expand their inquiries, and foster new collaborations. Society could reap more value from its scientific research investments.

Funded by the National Science Foundation, DataBridge partners include RENCI, the Data Intensive Cyber Environments (DICE) Center, and the Odum Institute, all at UNC-Chapel Hill. The team also includes researchers at North Carolina A&T State University and Harvard University.

The project’s three-year development cycle concludes this fall, and members of the research team recently produced a white paper documenting their work and early successes. To read the white paper, visit www.renci.org/White-Paper-2015-DataBridge.

Visit the DataBridge website at http://databridge.web.unc.edu/.