National Consortium for Data Science Names 2015 Data Fellows

CHAPEL HILL, NC, December 8, 2014 – The National Consortium for Data Science (NCDS), a public-private partnership to advance data science and address the challenges and opportunities of big data, today named three faculty members at three different universities as NCDS Data Fellows for the 2015 calendar year.

Each Data Fellow will each receive $50,000 to support work that addresses data science research issues in novel and innovative ways. Their work will be expected to advance the mission and vision of the NCDS, which formed in early 2013. Data Fellow positions are open to faculty members at NCDS member institutions, which includes universities in the University of North Carolina system, Duke University, Texas A & M University, and Drexel University. A wide range of researchers from six different member universities applied for the Fellowships. Their research proposals addressed many of the hot topics in data science, from cybersecurity to applying the techniques used by online music databases to develop more precise search algorithms and interest students in data science.

“This is the second year we’ve provided Data Fellows awards and we believe the program is a great way to bring together talented faculty researchers and our industry members who are interested in the practical applications of their work,” said Stan Ahalt, chair of the NCDS steering committee and director of UNC Chapel Hill’s Renaissance Computing Institute (RENCI), one of the founding members of the consortium. “We had applications from across our membership and the quality was outstanding. I know our members look forward to learning more about our new Fellows and to understanding how their research will advance data science and help organizations in business, government, and academia address their data challenges.”

The 2015 NCDS Data Fellows and their projects are:


david_gotz_photoDavid Gotz, PhD, associate professor, School of Information and Library Science, UNC Chapel Hill, and assistant director of the Carolina Health Informatics Program. Visual Analytics for Large-scale Temporal Event Data 

Large-scale temporal event data sets can contain vast numbers of long and complex sequences of time-stamped events and are found in a wide range of application domains including social networking activity, security logs, and electronic health records. This project will develop novel visual analytics methods to support exploratory analysis of temporal event data sets that are motivated by population health researchers exploring large collections of electronic medical record (EMR) data. More effective analysis methods for deriving insights from temporal event data such as medical diagnoses, procedures performed, lab tests, and medications prescribed, can provide evidence to support more personalized medical decision making and better health outcomes for patients. It can also be used in comparative effectiveness studies, epidemiological studies, and patient-centered outcomes research. However, current methods for exploring temporal event data and selecting subgroups for analysis are complicated and time consuming. Gotz plans to develop software for comprehensive visual analytics of these data in a way that is simpler, more intuitive, and much less time consuming for practitioners.


erik-sauleErik Saule, PhD, assistant professor, department of computer science, UNC Charlotte. Toward Machine Oblivious Graph Analysis.

Graphs are a popular tool used to model a wide range of phenomena and to show the relationships among various entities. For example, graphs can be used to model the physical path of city streets or aisles in a store in order to analyze traffic patterns and determine the best locations for businesses or for products within a retail store. In medicine, researchers use graphs to model regulatory pathways and gene expression, predict conditions, and identify the best drugs to use in treatments. Unfortunately, the explosion of digital data has led to a similar explosion in the computational costs of running graph analyses. New algorithms to deal with this challenge are usually inflexible, requiring the researcher to use a specific graph representation or a particular type of computer system for analysis. This project aims to develop a framework for performing efficient graph analysis regardless of the type of analysis being performed or the computer system used.


erjia_yan_photoErjia Yan, PhD, assistant professor, College of Computing and Informatics, Drexel University. Assessing the Impact of Data and Software on Science Using Hybrid Metrics.

In the age of data, the critical components of scientific and industrial research increasingly are data and software. These products can have significant impacts on future scientific discoveries and business innovation. Yet, they can be difficult to discover and assess because new knowledge is still catalogued in the form of published research papers. This project will address the problem of discovering and assessing the impact of data sets and software by identifying referencing patterns and designing hybrid metrics to assess the full impact of data and software. Unlike current data repository indexing, the project aims to provide context-driven, full text data analytics for data and software in order to account for the unsystematic ways in which these products are cited in scientific literature, including hyperlinks to web pages, footnotes, endnotes, and digital object identifiers. Ultimately, the project seeks to develop a system that will comprehensively capture the impact of data and software on knowledge production and discovery.


This is the second year of the NCDS Data Fellows Program. NCDS membership dues and supplemental funding from UNC General Administration support the program. For more information, please see

The NCDS also extends its thanks to the members who served on the 2015 Data Fellows selection committee:

  • Larry Alexander, Drexel University
  • Tom Carsey, Odum Institute, UNC Chapel Hill
  • Matthew Drahzal, IBM
  • Steve Gustafson, GE
  • Russ Gyurek, Cisco
  • Craig Hill, RTI International
  • John Moore, MCNC