Why Data Commons? Because scientists want to focus on science, not infrastructure


ESIP meeting participants discuss the challenges of a Data Commons at their recent summer meeting in Durham, NC.

After more than 25 years as a science communicator, I’ve come to recognize the things that all scientists, regardless of their disciplines, yearn for. It’s not an endless stream of funding or appreciation from the public for their work (although both would be nice).

Most scientists simply want to be able to concentrate on their science, rather than the tools, technologies, and resources that make modern-day collaborative science possible. Just like a Formula 1 driver, who wants to drive a car really fast without worrying about what’s going on under the hood, scientists want to do their work while computing, data management, security, etc., happens under the hood. That allows them to focus on solving problems rather than the infrastructure that supports problem solving.

It’s a beautifully simple concept that is, unfortunately, difficult to implement.

The term Data Commons (sometimes called Science Commons or Science as a Service) gets tossed around often among scientists and technologists working to make science more productive. So, in an effort to learn about the latest efforts to let scientists do science, I attended a session on Data Commons for the geosciences at the recent Earth Science Information Partner (ESIP) Federation meeting in Durham, NC.

A shout out to the following RENCI colleagues for organizing this session: CTO Charles Schmitt, Director of Environmental Initiatives Brian Blanton, Domain Scientist for Environmental Initiatives Chris Lenhardt, and Senior Research Software Developer Howard Lander. These guys and the teams they work with dedicate much brainpower to Data Commons and other strategies that facilitate more productive science and new scientific insights. Following is a recap of what I gleaned from the session:

  • Data Commons has different definitions depending on who you ask. For the ESIP session, Lenhardt defined it as integrated cyberinfrastructure for science research that brings together all the applications, services, and resources needed to conduct research. The researcher likely signs in to the system and sees a dashboard that pulls together tools, applications, integrated data management, curation, publishing, security, and more. The physical location of these tools and services is unimportant as long as the scientist can access them.
  • The realities of 21st-century research drive the need for a Data Commons. Interdisciplinary, collaborative work that generates more (often heterogeneous) data, and requires more models, simulations, and analytics makes federated resources a necessity. It also requires integrated data management so that data can be easily accessed, kept safe, and stored for future uses. Yes, Hollywood still loves the image of a slightly crazy, way too intelligent scientist working alone in some scary looking secret lab, but that’s not how it works.
  • There is no perfect system available that keeps all scientific support infrastructure under one roof, however, some scientific groups have created infrastructure that addresses at least part of the problem. EUDAT, the European Data Infrastructure, offers a suite of tools for finding, synching and exchanging, storing, sharing, and safely replicating research data, as well as a tool for sending data to compute resources. Other examples of Data Commons include the INCF (International Neuroinformatics Coordinating Facility) Dataspace for sharing neuroscience data, text, images, sounds, movies, models, and simulations, and the National Cancer Institute’s Genomic Data Commons, which provides cancer researchers with a unified data repository that enables data sharing across studies.
  • If the need is great, why all the discussion? Just get to work, right? If only it were so simple. Different science communities use different terminology even when referring to the same phenomena, specimen, or disease symptom. That makes sharing data across disciplines difficult. Different disciplines also have different ways of working and collaborating, different computing and analysis needs, and vary greatly in how collaborative and interconnected they are. With so much variation, a universal Data Commons is unlikely. However, domain-specific Data Commons that can link with other Data Commons and share features and tools is an obtainable goal.

Should the geosciences develop their own Data Commons? ESIP participants say yes. That means geoscientists need to collaborate with data scientists, networking and security specialists, and computer scientists to develop a commons that is tightly linked to real science problems and grows from the ground up based on the needs of the community. ESIP has started to address this challenge through its sustainable data management cluster, a group that promotes collaboration and coordination in managing environmental science data. Others suggest surveying what’s already been done to avoid duplication and reinvention. A national-level study on what kind of infrastructure is needed for science could also contribute to solving the data commons challenge.

Whatever happens, the need for scientific cyberinfrastructure continues to grow and scientists continue to wish for that under-the-hood solution that will finally free them from the multiple roles of domain scientist, data scientist, computer scientist, and network scientist. Let geologist be geologists, meteorologist be meteorologists, physicists be physicists…you get the picture.

-Karen Green

Introducing the Women of RENCI

As Women’s History Month draws to a close, RENCI acknowledges the daily hard work of each of its female employees. The research strides occurring at RENCI would not be possible without our female researchers, project coordinators, administrators, and communicators.

From left to right: Asia Mieczkowska, Jennifer Resnick, Claris Castillo, Hong Yi, Lea Shanley, Caryn Best, Lisa Stillwell, Margaret Wesley, Kristi Andrews, Laura Capps Hill, Rebekah Sturgess, Karen Green, Dawn Carsey, Annie Goessling, and Stephanie Suber

From left to right: Asia Mieczkowska, Jennifer Resnick, Claris Castillo, Hong Yi, Lea Shanley, Caryn Best, Lisa Stillwell, Margaret Wesley, Kristi Andrews, Laura Capps Hill, Rebekah Sturgess, Karen Green, Dawn Carsey, Annie Goessling, and Stephanie Suber

Recently, the RENCI communications team rounded up as many “Women of RENCI” as possible for a group photo and to learn more about how they contribute to the organization. The list below (and the photo) summarize the information gathered on that day. Read more…

RENCI CTO speaks to high school students on the future of computer science

The next generation of potential computer scientists are making their way to K-12 classrooms each day, but are these young minds being exposed to the fundamentals of computer science? According to Code.org, only one in four American high schools offer computer science courses, and few of those schools allow the course to count toward graduation.

To counteract these statistics, some computer scientists are working harder to share their knowledge and experiences from the field. RENCI’s Director of Informatics and Chief Technology Officer Charles Schmitt, PhD, joined the cause recently when he visited the North Carolina School of Science and Math (NCSSM) to speak to a group of students about computer science.   Read more…

Research Triangle Analysts at RENCI: Topological Data Analysis

Research Triangle Analysts met at RENCI for their first monthly meeting of the new year on January 19. Research Triangle Analysts meet at RENCI every third month and elsewhere around the Triangle during other months. The group, a 501(c)(3) non-profit and all-volunteer organization, promotes the advancement of data science throughout the Triangle’s collaborative communities of analysts, mathematicians, statisticians, and scientists.

Research Triangle Analysts participants learn about topological data analysis at RENCI.

Research Triangle Analysts participants learn about topological data analysis at RENCI.

Hamza Ghadyali, a PhD candidate in mathematics at Duke University, featured as the speaker for the meeting. Ghadyali develops new topological data analysts (TDA) tools, particularly for the analysis of electroencephalogram (EEG) data. Topology is the mathematical study of shape. TDA tools analyze large, noisy, complex datasets from disciplines such as, but not limited to, oncology, astronomy, meteorology, and neuroscience. Analysis of the shapes and changes in shape represented by data yield information about the data.  Read more…

Crossing the pond in the name of better data management

iRODS Chief Technologist Jason Coposky offers guidance to iRODS users at the University of Utrecht.

iRODS Chief Technologist Jason Coposky offers guidance to iRODS users at the University of Utrecht.

The iRODS data management platform and the iRODS Consortium that works to sustain it are making waves well beyond their home base in Chapel Hill, NC.

This week, three of the smart, savvy people behind iRODS and the Consortium (iRODS originator Reagan Moore, Consortium Executive Director Dan Bedard, and Chief Technologist Jason Coposky) traveled to France, the United Kingdom, and the Netherlands to talk about the benefits of iRODS as a data management solution for large distributed research projects, to provide training for those interested in becoming iRODS power users, and generally to evangelize about software that is now being used far and wide in Europe, the U.S., Asia, South America, Australia, and South Africa.  Read more…

Coffee and Viz series brings teaching in a Social Computing Room to life

Professors at NC State University and UNC-Chapel Hill have access to a tool that can bring both excitement and exploration into their curriculum – the Social Computing Room (SCR). While the resource is available on both campuses, educators can be unsure about how it effectively fits into their course plans.

NC State’s Coffee and Viz series hopes to provide ideas for instructors of all disciplines by highlighting those already using SCRs and other visualization spaces and by providing speakers with novel ideas for the use of visualization in education and research.

Read more…

DataNet presentations lead to invigorating discussion at ESA annual meeting

ESAlogoDataNet Tools and Services was the topic of a session at the recent Ecological Society of
America Annual Meeting, held last month in Baltimore.

Chris Lenhardt and Mike Conway presented in the session representing the UNC Chapel Hill-based DataNet Federation Consortium (DFC). Chris is lead of the DFC Facilities and Operations team and is active in RENCI’s environmental sciences group; Mike is a senior developer with DFC.

Organized by Amber Budden of the DataONE DataNet project, the session used the IGNITE format: a series of 5-minute, 20-slide talks followed by Q & A. The fast-paced IGNITE talks present forward-looking, unconventional, and/or controversial ideas to spur the audience into questioning their usual assumptions and thinking creatively about the topic. Both of the DFC IGNITE talks challenged the audience to consider how a data management system can provide tools and services for scientists that go beyond simply storing, indexing discovering, and accessing data files. Read more…

Three keys to work-life balance

Last week, I was asked to speak to young professionals about work-life balance, so I have been pondering this topic a lot. How do you juggle both a full-time, demanding and exacting career and the often-contradictory demands of raising little human beings to become productive members of society? To be honest, I think the “secret” is that all of us are just winging it, really, and we are creating and maintaining balance as we go – even if it doesn’t appear that way to others from the outside. Parenting and careers are all about change. Just when you think you have achieved the perfect balance, something changes – your child starts potty training, enters puberty, adjusts to a new school, or gets chosen for a school team. You earn a promotion and gain new responsibilities, move offices (which affects your commute), or start a new job. Your spouse has to travel more or has a change in health condition. Older family members need care and help in a way they haven’t before.

Read more…

Software to accelerate science

In three years, the WSSI shows that software best practices can make a difference in water science.

Ask any elementary school student and they will tell you that water is a renewable resource.

Unfortunately, this “fact” comes with a few complications, like the truth that if we are not careful stewards of our water, it will run out. According to the United Nations Department of Economic and Social Affairs (UNDESA), “more than 1.7 billion people live in river basins where depletion through use exceeds natural recharge.”

This trend could see two-thirds of the world’s population living in water-stressed countries by 2025.

Understanding and sustaining water resources depends on using the best scientific modeling and software development practices, which is why RENCI has been part of the Water Science Software Institute (WSSI) planning grant for the past three years.   Read more…

Collaborations in coastal resilience

New funding for DHS Center of Excellence means continued collaboration with RENCI on coastal issues

The U.S. Department of Homeland Security (DHS) recently announced it will provide $20 million over five years to fund the Coastal Resilience Center of Excellence (COE) at UNC-Chapel Hill. That’s a good thing for people in coastal areas who each year must cope with hurricanes, erosion, flooding, and storm surge.

Aftermath of Hurricane Sandy. RENCI and the DHS Coastal Resilience Center work together to improve hurricane storm surge prediction.

Aftermath of Hurricane Sandy. RENCI and the DHS Coastal Resilience Center work together to improve hurricane storm surge prediction.

The new grant acknowledges the effectiveness of a longtime partnership between the Coastal Resilience COE (formerly the Coastal Hazards Center of Excellence) and RENCI. For more than five years, Brian Blanton, RENCI’s director of environmental programs and a coastal oceanographer, has worked closely with Rick Luettich, lead investigator for the Coastal Resilience COE and director of UNC’s Institute of Marine Sciences, to enhance the ADCIRC storm surge modeling system and put it to use as a tool to help coastal communities understand, predict, and mitigate the impacts of coastal storms.

Read more…

Page 1 of 512345