National Consortium for Data Science awards fellowships to researchers working to advance data science

spotlight-image-data-fellowsCHAPEL HILL, NC, Nov. 19, 2013 – The National Consortium for Data Science (NCDS) has named five faculty members at North Carolina universities as its inaugural Data Science Faculty Fellows.

The Faculty Fellows will each receive $30,000 to support research projects that address novel and innovative data science research issues. Their work will be expected to advance the mission and vision of the NCDS, which formed in early 2013 as a public-private partnership to address the challenges related to collecting, sharing and using large, diverse data collections, or big data.  Twenty faculty members from seven institutions applied for Fellowships. Their proposals were reviewed by a subcommittee from NCDS member institutions.

“This is our first effort to support scientists involved in research that shows promise for advancing data science and unleashing the power of big data for discovery and economic competitiveness,” and we are thrilled with the quality of these Fellows,” said Stan Ahalt, chair of the NCDS steering committee and director of UNC Chapel Hill’s Renaissance Computing Institute (RENCI), one of the founding members of the consortium. “We are thrilled with these awards and the quality of all the proposals we received from NCDS academic institutions and from across the UNC System. We expect their work to help advance the field of data science and bridge the gap between data science researchers and professionals in industry and government who depend on big data.”

The five 2014 NCDS Data Science Faculty Fellows and their projects are:

Rajeev Agrawal, PhD, assistant professor, department of electronics, computer and information technology, North Carolina A & T State University. Designing Sustainable and Domain Neutral Next Generation Data Infrastructure to Advance Big Data Science.

This project will develop the design specifications for creating a sustainable data infrastructure for data-intensive research problems that is usable by scientists in all research communities. Data-intensive problems, which range from understanding global environmental issues to reverse engineering the brain to genomic sequencing to understand diseases, require a technical infrastructure that works across computer platforms and scientific domains, allows collaboration among researchers at different locations, and can manage, analyze and store huge data sets.  The resulting infrastructure could also be a tool for data science education and workforce development.

Jane Greenberg, PhD, professor, School of Information and Library Science, UNC-Chapel Hill, and Director, Metadata Research Center. The Metadata Capital Initiative.

Metadata, or data about data, is crucial if data is to be reused, shared or repurposed for other uses over time. This project will expand on Greenberg’s ongoing work to understand “metadata capital,” or the value—as measured by net gain or loss—of metadata and how that value changes over time.  The work will use case studies, collaborative workflow modeling and content analysis to scientifically study metadata capital. Data environments from the National Institute of Environmental Health Sciences, SAS, and RTI, all NCDS member institutions, will be investigated; and data from NCDS member institutions will be considered.

Blair Sullivan, PhD, assistant professor, department of computer science, North Carolina State University. Tracking Community Evolution in Dynamic Graph Data Using Tree-like Structure.

As the amount of available research data has exploded, methods for managing, analyzing and visualizing that information have not kept up, especially in the case of graph or relational data sets. This work will focus on a key task in improving analysis of graph data: the identification and tracking of overlapping groups of similar entities (e.g. people, samples, genes) over time.  Tree-like structures of connections exist in these types of data sets. The research will develop new methods for forming a hierarchy of overlapping groups from a combination of the k-core and tree decompositions of a network, and explore its evolution in time-dependent graph data. The goal is to develop new algorithms that will improve data analysis and workflow in fields as diverse as network analysis, healthcare policy, materials science, climate simulation, fluid dynamics, bioinformatics, and cyber security.

Wlodek Zadrozny, PhD., associate professor, College of Computing and Informatics, UNC-Charlotte. Searchable Repository of Resilience and Sustainability Technologies.

This project aims to build a searchable data repository of technologies related to resilience and sustainability (R&S) using advanced information retrieval and text processing methods. Initial data will come from a set of U.S. patents and patent applications that contain thousands of solutions to R&S problems. The project will use an approach to semantic analysis and data preparation partly inspired by the IBM Watson project: A task-based document format, semantic search, and multidimensional scoring of search results.

Justin Zahn, department of computer science, North Carolina A & T State University, COMDET: A Novel Community Detection System for Large Networks.

This work seeks to develop a game-theoretic model for community evolution of large networks, including social and biological networks. It will study the structure and dynamics of network communities, with the goal of inventing novel methods for detecting network communities and building predictive models of the behavior of groups of people by using massive data sets, data mining and machine learning. A better understanding of network communities can impact public policy and health strategies, product development and advertising, or, in biological networks, shed light on the functions of cells, proteins and genes.

The Data Science Faculty Fellows have one-year appointments that begin Jan. 1, 2014.  NCDS membership dues and supplemental funding from UNC General Administration support the Fellows Program.

The NCDS launched in April 2013 as a way to address the challenges and opportunities posed by massive data sets being created by digital medicine, environmental sensors, scientific instruments, social networks, and more. It’s goals include: identifying key data science challenges; encouraging data science research that spans academia, industry and government; facilitating improved data science education; supporting technical, ethical and policy standards for data; and applying data science expertise to many societal problems and scientific disciplines, including genomics, environmental sciences, energy, sustainability, social and population studies, and materials science.  NCDS Founding members are RENCI, UNC-Chapel Hill, Cisco, Drexel University, Duke University, GE, IBM, MCNC, the National Institute for Environmental Health Sciences,  North Carolina State University, SAS, RTI, Texas A & M University, UNC-Charlotte, UNC General Administration, and the U.S. Environmental Protection Agency.

For more information, see

Media Contact:
Karen Green, 919.619.8213,

Press photos