Experts and researchers balance the scales at the NSF Conference on Data Science and Law

Data Science and Law are both disciplines that have perceived high barriers for entry. With data science, outsiders are overwhelmed by the thought of having to understand hard math and complicated computer code, as the Chief Justice of the Supreme Court demonstrated when he called statistical evidence of political gerrymandering “sociological gobbledygook.” With respect to law, computer and data scientists feel unequipped to interpret the fairness and justice of their work and perhaps do not even see it as relevant. Many data practitioners believe, “I am just writing an algorithm. It’s math and data; I’m not responsible for what happens downstream.” 

“As data increasingly affects all aspects of daily life, we cannot continue to let data science exist in a vacuum, without thinking of the legal, ethical, and societal implications that result from that math and data. We are being reminded daily of the inadequacy of legal frameworks and lack of governmental oversight of data protection, privacy, and security,” said Sarah Davis, senior project manager at RENCI. “Similarly, legal practitioners and researchers cannot ignore or willfully misunderstand the opportunities and dangers of a data-centric society. Increasingly, ‘black box’ algorithms will be used to make decisions that may attack privacy rights, violate due process, or discriminate against protected groups.”

Davis, who worked as an attorney prior to joining RENCI’s project management team, recognized the community effort and conversation required to create bridges between data science and law. To address this challenge, Davis partnered with the National Science Foundation (NSF) and assembled a planning committee with representatives from RENCI, UNC Department Of Psychology and Neuroscience, Georgia State University, UNC School of Law, NC State University Department of Industrial and Systems Engineering, UNC Department of Public Policy, Duke University School of Law and the National Courts and Sciences Institute. The end result was the NSF Conference on Data Science & Law, an event where experts and researchers in law, data science, computer science, and the social sciences could gather virtually to exchange ideas and develop promising avenues of research to bridge the gap between the two subjects. 

The event began with a welcome and a statement of the goals for the conference, which included building an interdisciplinary community of law, data science, and social science practitioners to develop a new research network; highlighting the most promising theoretical frameworks and avenues of research, both for conference attendees and for the larger interdisciplinary communities; and keeping ethical decision making at the forefront in both law and data science. 

“There is an assumption among scientists from all backgrounds that it is not their responsibility to adjudicate the potential uses or ethical implications of their research. By hosting this event, we intend to build a community that will help enrich our understanding of the legal concepts and frameworks that exist in the system in order to better work within it.” said Stan Ahalt, director at RENCI. “There are substantial areas of research interest exactly at the intersection of data science and law, including artificial intelligence, cybersecurity, and health data management. For example, the NIH HEAL Initiative is focused on studying the opioid crisis, which poses several questions with significant health, data science, and law implications.”

Following introductions, the conference featured lightning talks and breakout sessions on four key topics that overlap in both fields. During the breakout sessions, attendees gathered their thoughts by using EasyRetro boards. A summary of topic abstracts and remaining research questions outlined by attendees can be viewed below.

  • Data Governance Standards for Research: Nearly every organization is now a data-driven organization, and organizational leaders in every corner of society must make data governance decisions to best serve their constituents. This topic asks how can data be leveraged in research to inform decision-making in a way that protects important social needs; how can various data models serve social data governance needs; what comprehensible/accessible criteria might organizations use to make choices about data access in order to serve all stakeholder needs; and how data governance characteristics can be communicated clearly through common language and accessible decision-support tools.
  • Evaluating Data Science Evidence and Experts: As Big Data becomes more widespread and central to people’s lives, the need to evaluate data properly becomes more pronounced, especially in the context of legal disputes. This topic covers issues such as the legal admissibility of data science evidence, the ability of legal factfinders to understand complex scientific evidence, and how data science experts can communicate their work effectively in court. With regards to admissibility, there is a need to establish norms for data validity, data veracity, and data provenance. Similarly, communicating the results of data analysis methods will require that the courts establish some baseline expectations on how complex quantitative information can be explained in a comprehensible manner.
  • Data and Legal Frameworks in the Digital Space: The politicization of the digital space is a growing area of inquiry. However, regulation of civic engagement online is inconsistent and digital-political phenomena are inherently difficult to measure. Researchers explored the policies and laws that regulate (or do not regulate) content online and the private firms that host this content; how private firms are creating and enforcing de facto restrictions on expression and speech; and the types of, integration of, and gaps in data to study these topics.
  • Developing a Taxonomy of Algorithmic Bias: Algorithms promise decision making divorced from human intervention, and therefore human bias, but this view has proven naive. In this session, we examined the sources of bias in algorithmic training and use, the role of algorithms in perpetuating pre-existing bias, and the goal of detecting and eliminating algorithmic bias. Legal experts and data scientists discussed the role of algorithms in the law, including algorithms’ impact on areas such as criminal and employment law and legal limitations on the design and use of algorithms.

Next steps for attendees include seeking additional guidance with NSF to align with its goals, conducting a careful analysis of the EasyRetro boards used during breakout sessions, and recruiting volunteers to continue the conversation and momentum until the planned fall meeting.

At the fall conference, researchers plan to continue to work towards advancing relevant theories from the natural and social sciences, lowering the barrier for entry, and creating on-going conversations that promote fairness in data science and clarity in the legal implications of data-driven decision making.

If you’re interested in learning more or getting involved with future events, please contact Sarah Davis at sdavis@renci.org.

By Sarah Davis, Senior Project Manager, and Jayasree Jaganatha, Communications & Marketing Specialist, RENCI

Tagged , |

RENCI partners with CUAHSI and others to launch Critical Zone Collaborative Network Hub

Five year cooperative agreement offers opportunity to accelerate research on boundary layers of rock, soil, air, water, and living organisms 

The Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI) has been selected to be the Coordinating Hub for the NSF-funded Critical Zone (CZ) Collaborative Network

Collaborators in this new venture include representatives from RENCI, the US Geological Survey, Pennsylvania State University, Utah State University, and the Lahmont-Doherty Earth Observatory of Columbia University. All members of the team have experience with Critical Zone Science and the previous Critical Zone Observatory Network.

Operation of the Hub will include four primary tasks:

  1. Enhance and integrate existing data services and establish cyberinfrastructure with a distributed architecture that links existing data facilities and services, including HydroShare, EarthChem, SESAR, OpenTopography, and eventually other systems via a central Hub that provides services for easy data submission, integrated data discovery and access, and computational resources for data analysis and visualization.  
  2. Support discovery through community synthesis activities and via access to community data and modeling cyberinfrastructure.
  3. Broaden the CZ community through outreach and education activities.
  4. Enhance collaboration among the CZ Thematic Clusters through coordination, sharing, community meetings, and outreach. The nine CZ Thematic Clusters across the US conducting basic research into the structure, function, and processes of the critical zone are:
  • CINet: Critical Interface Network in Intensively Managed Landscapes
  • The Coastal Critical Zone: Processes that transform landscapes and fluxes between land and sea
  • Bedrock controls on the deep critical zone, landscapes, and ecosystems
  • Dynamic Water Storage — Quantifying controls and feedbacks of dynamic storage on critical zone processes in western montane watersheds
  • Urban Critical Zone processes along the Piedmont-Coastal Plain transition
  • Patterns and controls of ecohydrology, CO2 fluxes, and nutrient availability in pedogenic carbonate-dominated dryland critical zones
  • Dust in the Critical Zone from the Great Basin to the Rocky Mountains
  • Using Big Data approaches to assess ecohydrological resilience across scales
  • Geomicrobiology and Biogeochemistry in the Critical Zone

RENCI’s role in the Hub will be to host the cyberinfrastructure for the CZ Hub data submission portal.

RENCI to help guide effort to improve the efficiency of drone applications by leveraging edge, in-network, and cloud computing

The Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill will collaborate on a $749,998, two-year effort to develop new architectures and tools for the safe, efficient, and economic operation of drones. The funding was awarded by the National Science Foundation (NSF).

Led by the University of Massachusetts Amherst, scientists from RENCI, the Information Sciences Institute (ISI) at the University of Southern California (USC), and the University of Missouri, will collaborate on FlyNet, a project that will utilize edge, cloud, and in-network computing to generate crucial data that will help them address a variety of pressing issues presented by drones.

Read more…
Tagged , , |

RENCI to develop advanced network software for AtlanticWave-SDX 2.0

Sharing big data requires big networks. Systems like AtlanticWave-SDX, which connects networks in the U.S., Chile, Brazil, and South Africa, provide specialized infrastructure needed to send vast amounts of scientific data across long distances, helping scientists make the most of powerful data collections.

RENCI scientists contributed to the development of AtlanticWave-SDX, a distributed experimental software-defined exchange (SDX) that uses cutting-edge network technology to facilitate the exchange of data between research and education networks in the U.S. with networks on other continents.

Now, RENCI will play a leading role in software development and testing for AtlanticWave-SDX 2.0. The five-year project, supported by a recent $6.5-million award from the U.S. National Science Foundation (NSF), is led by Florida International University and also includes the University of Southern California.

Read more…
Tagged , |

ROBOKOP technology offers faster, easier exploration of emerging COVID-19 research

As scientists around the world urgently work to understand the best ways to diagnose and treat COVID-19, quick and easy access to the latest research findings and rapid exploration of emerging data have become critical. RENCI scientists have developed new tools and approaches that can help researchers make important discoveries and answer key questions about COVID-19 in record time.

“These new approaches allow scientists to blend together novel observations and information from recent papers with previously known information that can be used to inform, contextualize, and test new COVID-19 information,” said Chris Bizon, director of analytics and data science at RENCI.

Read more…
Tagged , , |

New digital laboratory helps get COVID-19 analyses up and running quickly

Data analysis and visualization are helping answer a variety of questions about COVID-19 such as who is most at risk, how is the disease spreading, and what approaches might work best for treatments. However, setting up a computer environment to analyze the large amounts of data needed to answer such questions is no easy task. It requires selecting data libraries, software, and hardware and estimating how much memory and computing power will be needed. This process is time consuming and few individuals have the complex skill set needed to accomplish it.

RENCI scientists have developed a new digital data science laboratory called Blackbalsam that can help significantly shorten the planning stage for these efforts with a standardized environment housing computational and data sets for COVID-19 analytics.  

“As COVID-19 progressed, I saw that researchers were conducting analyses and visualization on an increasingly varied set of COVID-19 data,” said Blackbalsam co-author Steven Cox, assistant director of software systems architecture at RENCI. “I realized that it would be very helpful to have an environment that overcomes well-known technological and skill barriers by providing an interface that researchers with statistical, analytical, and visualization skills could use.”

Read more…
Tagged , |

Professor learns new lessons while teaching during a pandemic

When UNC students left for spring break on March 9, the COVID-19 public health crisis was just heating up. Soon after, UNC administrators made the decision to move to remote teaching and extended the break by a week to give instructors time to prepare. RENCI Deputy Director Ashok Krishnamurthy was one of many UNC professors who made the quick transition to teaching via video conferencing on Zoom.

What course were you teaching when you received notice that classes would all be moved online?

I was teaching a computer science course called Introduction to Scientific Programming that is designed for non-computer science majors. Most of the students take the class to learn programming skills for their day-to-day work or research. My section of the course had about 160 students enrolled.

How easily were you able to convert this class to a virtual format?

Fortunately, the course was relatively easy to adapt to virtual teaching. The UNC Computer Science department, and my colleague John Majikes who was teaching another section of the same course, have set up this course in such a way that taking it online was quite straightforward.

Read more…
Tagged , |

Beyond data: Supporting community during a pandemic

Families showing off their new face masks, donated by Sarah Davis.

When COVID-19 cases began to appear across the country, many RENCI employees felt a call to action. While several took it upon themselves to develop new data science technologies or to adapt existing ones to process COVID-19 data, others have contributed to communities in need by creating face masks, assisting food banks, connecting researchers to projects, and supporting foster youth.

Creating Face Masks

Like many across the nation, some RENCI employees have started sewing face masks to donate to medical workers, neighbors, and people in need.

Read more…
Tagged , |

FABRIC and iRODS events shift to virtual venues

Due to the current COVID-19 situation, many previously scheduled in-person events are moving to virtual spaces. Events associated with RENCI projects and with consortium and partner institutions are making decisions daily about whether to postpone, transition to virtual, or potentially proceed in the late summer and fall. We will keep our events calendar updated, so check back regularly for announcements. 

Two major events that have made the choice to transition to virtual are the FABRIC Community Workshop and the iRODS User Group Meeting. Although this change is unprecedented, both teams are adjusting their sessions to accommodate the virtual atmosphere and provide a memorable experience for attendees.

Read more…
Tagged , |

From the Director: RENCI responds to the COVID-19 crisis

The past few weeks have presented unique challenges to how we work, how we enjoy ourselves, and how we live our everyday lives. We are all worried about the uncertainties and about how this will affect us and our loved ones in the coming weeks. 

That being said, I am proud to work at RENCI and UNC-Chapel Hill, and blessed to be confronting the current challenges in a region where we are so fortunate to have skilled personnel and resources to bring to bear.

We can gather data and compute. We can volunteer. We can serve. We can encourage each other. We can broadcast. With relatively limited risk through working remotely, we can use our brains, our team spirit, our good will, our tools, and our machines to do science and serve.

Read more…