RENCI participates in NSF Cyber Carpentry workshop to prepare early-career researchers

Big data is only getting bigger, and that can cause big problems for researchers who need to store and share their data. Twenty doctoral students and post-doctoral associates from across the county learned the tools and techniques to solve these problems at the inaugural Cyber Carpentry Workshop at the University of North Carolina at Chapel Hill. Sponsored by the National Science Foundation (NSF) and hosted by the UNC School of Information and Library Science (SILS), the two-week workshop in late July introduced students to a variety of applications, platforms, and processes for data life-cycle management and data-intensive computation. The Renaissance Computing Institute (RENCI) provided support for the workshop in the form of instructors and project management staff.

Teacher and students discuss an issue with their team project.
From left: Andres Espindola-Camacho from Oklahoma State University, Jeremy Thorpe from Johns Hopkins University School of Medicine, Gaurav Kandoi from Iowa State University, and Yingru Xu from Duke University discuss an issue with their team project.

“Previously, you had maybe a thousand files, maybe ten thousand,” said Arcot Rajasekar, SILS professor and RENCI chief domain scientist in data grid technology. “Now, you’re talking about 100 million files and doing simulations and emulations that can create petabytes of data. Managing that just by human interaction is not going to be effective; you need some automation there. In addition to the volume of data, you have to consider the velocity of data coming in and the multiple varieties of data you’re collecting. This is not easily done without a good level of management.”

Though not affiliated with Software Carpentry or Data Carpentry, Cyber Carpentry organizers drew inspiration from those projects. The workshop at Carolina brought together data professionals, educators, and researchers from RENCIthe iRODS Consortium, SILS, the Odum Institute, the University of Arizona (CyVerse), Indiana University (Jetstream), University of Virginia (Hydroshare), Drexel University, and Amazon (AWS)) to teach these intensive two-week courses.

The workshop familiarized participants with the concepts of virtualization, automation, and federation as defined through the Datanet Federation Consortium (DFC), an NSF-funded project that promotes sharing within and across science and engineering disciplines. Instructors introduced specific DFC web portals, including CyVerse, Dataverse, DataONE, and Hydroshare, as well as relevant software, metadata management strategies, and large-scale workflows.  

Participants learned the basics of the integrated Rule-Oriented Data System (iRODS), which is free open source software for data discovery, workflow automation, secure collaboration, and data virtualization used by research and business organizations around the globe. Housed at RENCI, the iRODS Consortium guides development and support of iRODS. Terrell Russell, iRODS chief technologist, and Hao Xu, a RENCI research scientist, both taught courses about iRODS during the two-week workshop.

“The students in this workshop are not yet in charge of securing federal funding and writing data management plans, but they’ll be there very soon,” said Russell. “We want them to know about the tools they’ll need when the time is right.”

iRODS Chief Technologist Terrell Russell discusses the capabilities of the open source data management software with Cyber Carpentry participants.
The workshop drew students from across the country, with NSF-funding providing travel and accommodation support. Anuja Majmundar, a doctoral student at the University of Southern California, said the Cyber Carpentry workshop offered a great opportunity for her to learn tools and procedures that could make data science more reproducible and scalable, especially for the diverse data streams she encounters in her research on health behaviors.

Jocelyn Colella, a PhD candidate in evolutionary genomics at the University of New Mexico, said gaining experience with containers – programs that can virtualize entire scientific workflows, including software, libraries, and data  – was one of the highlights of her experience, and the introduction to the JetStream and CyVerse virtual environments had significant implications for her research.

“Coming from a smaller lab, it has been incredibly expensive to build the computing resources and data archival infrastructure necessary to deal with terabytes of genomic data,” she said. “Learning about the free computational and storage resources available through NSF-funded projects has revolutionized how I conceptualize my own workflows and will alter how I apply for grants going into the future.”

This workshop was funded by the NSF Cyber Training program. Look for information about the 2019 summer workshop at

Strategies for hiring and maintaining a diverse data scientists workforce

RTI’s Kristina Brunelle (left) moderates a panel discussion with Amy Roussel, RTI (center); Gracie Johnson-Lopez, Diversity and HR Solutions (right); and Sackeena Gordon-Jones, Transformation Edge and NC State University (on screen).

Data science is hot. That’s good news for workers with data science skills. It also means organizations competing to hire data scientists need to understand how to recruit talent that will solve their data science challenges and contribute to creating a productive and diverse workforce.  Read more…

Winston-Salem State University students visit RENCI, UNC-Chapel Hill

A group of undergraduate Winston-Salem State University (WSSU) students majoring in math recently visited Chapel Hill for an educational tour of RENCI and to make connections with peers and educators in the UNC-Chapel Hill math department.

John Hutchens and Felicia Griffin, assistant professors in the mathematics department at WSSU, arranged the visit as part of a series for their students to highlight the types of jobs available to math and computer science graduates. Read more…

Tracking the story of the ENIAC programmers

Jean Jennings Bartik (left), and Frances Bilas Smith in 1946 with ENIAC, the world’s first all-electonic computer. Photo credit: Computer History Museum

Six women who changed computing finally get their day in the spotlight.

More than 70 years ago, six brilliant mathematicians came to Philadelphia to take part in a secret U.S. Army project designed to help the Allies win World War II. These young pioneers of the computing age learned to program using only logical diagrams and their considerable talents—no programming languages or tools existed to help them.  Read more…

Women to show their data science chops at 2018 WiDS conference

The Women in Data Science (WiDS) Conference returns for a third year to Stanford University on March 5. This one-day, technical conference features world-class speakers discussing a wide array of data science, machine learning, and artificial intelligence research and applications, from computational finance, to astrophysics, to cybersecurity, and much more. All genders are invited to participate in the conference, which features exclusively female speakers.  Read more…

South Big Data Hub, NCDS help sponsor Southern Data Science Conference; registration now open

The 2018 Southern Data Science Conference (SDSC 18) will bring experts and researchers from top companies and research institutes to Atlanta on April 13 and 14 for two days of sharing best practices and discussing the latest issues, challenges, and trends in data science.  Read more…

RENCI makes an impact at fall AGU meeting

Chris Lenhardt (left), and Howard Lander with their poster at the AGU fall meeting.

Why was New Orleans inundated with scientists during the week of December 11 – 15? They were in the Big Easy for the fall meeting of the American Geophysical Union (AGU), a conference that attracts about 24,000 Earth and geoscientists from across the nation and around the globe. Read more…

RENCI’s Claris Castillo strengthens her leadership muscles through BRIDGES

Claris Castillo, a senior computational and networked systems researcher at RENCI, recently completed the BRIDGES program that promotes academic leadership among women. The four-week program targets women in higher education institutions seeking to strengthen their academic leadership skills and advance their careers in academia.  Read more…

Educators offer tips on making sense of the data revolution

“We are creating every 10 minutes what we were creating every 2,000 years, and that’s the problem.”

This statement, by panelist Arcot Rajasekar, succinctly sums up one of the many challenges stemming from the modern big data environment discussed at “A Citizen’s Guide to Big Data.” Read more…

RENCI scientist set to join the ranks of Climate Reality Leaders

As the Climate Reality Project website says: Ordinary people face challenges. Climate Reality Leaders embrace them.

RENCI’s Chris Lenhardt, an environmental data science and systems expert, will join the ranks of these leaders when he attends three days of training Oct. 17 – 19 to become part of the Climate Reality Leadership Corps. The Leadership Corps is a global network of individuals committed to tackling the climate crisis and solving what is far and away the greatest challenge of our time. The Leadership Corps is part of the Climate Reality Project launched by former Vice President Al Gore to increase awareness about climate change and to support efforts at all levels aimed at reducing carbon emissions. Read more…

