Building better classifiers for reproducible science

RENCI’s Clark Jeffries recently presented a webinar for Orion Bionetworks called “Seeking Best Practices in Classifier Construction and Testing.”

Jeffries is a PhD-level bioinformatics specialist with an interest in interpreting neuroscientific information to better understand and treat psychiatric and neurological conditions. For years he has worked with researchers in the School of Medicine at UNC-Chapel Hill to analyze data and better understand debilitating diseases like schizophrenia.

Research that involves data from patients with medical conditions always poses challenges. The researchers much keep the data safe and confidential, and they must construct experiments carefully and concisely in order to assure that other scientists can reproduce their results.

According to Jeffries, three factors affect reproducibility: case/control and selection of subjects; the consistency of lab assays; and consistent construction and testing of classifiers. Jeffries has focused much of his recent work on the third factor, developing stringent testing of classifiers to ensure that samples are classified correctly and are useful research tools.

You can read about Jeffries recent work with a team led by Diana O. Perkins, a professor of psychiatry in the UNC School of Medicine, here.  For more depth about building and testing classifiers, check out the Orion Bionetworks webinar here.

Classification methods for biological samples might sound a bit arcane to those of us who don’t spend our days in life sciences labs. But it’s that attention to detail that helps bring about breakthroughs – for example, a blood test that could help doctors determine if a patient showing psychiatric systems is likely to develop psychosis.

Environmental science infrastructure across continents


The CoopEUS conference will be he;d in Helsinki, Finland, starting Sept. 30

The CoopEUS annual meeting will be held in Helsinki, Finland, starting Sept. 30

Infrastructure for research in the environmental sciences shouldn’t be constrained by national boundaries. That’s the idea behind Cooperation EU-US, or CoopEUS.

Launched by the U.S. National Science Foundation and the European Union through its Research Infrastructures action of the 7th Framework Programme for Research and Innovation, CoopEUS brings together American and European scientists involved in environmental research projects for collaboration that will facilitate building a truly global and integrated infrastructure to support environmental research.

Large environmental research efforts, such as the NSF-funded National Ecological Observatory Network (NEON) and Ocean Observatories Initiative (OOI) and the EU’s European Plate Observing System (EPOS) and LifeWatch, need a global, integrated research infrastructure in order to fully address issues that have global impacts. The key to creating this infrastructure is the ability to easily access and openly share data—both raw data collected by environmental sensors and research data from thousands of projects underway across both continents.

If CoopEUS is successful, it will help major initiatives such as NEON more quickly and seamlessly produce high-quality scientific results. In turn, that new knowledge will help individuals, organizations and governments understand and cope with major environmental issues, such as sea level rise and drought.

CoopEUS holds its annual meeting beginning Sept. 30 in Helsinki, Finland, and RENCI’s Chris Lenhardt, domain scientist for environmental data science and systems, will be there as the group’s newly appointed chair of the CoopEUS Strategic Cooperation Board.

The RENCI connection with CoopEUS means our own computer scientists, software engineers, and networking and data specialists could play a role in developing the CoopEUS roadmap and creating its research infrastructure. Through RENCI, CoopEUS also has the opportunity to link to some major data science initiatives, including the National Consortium for Data Science (NCDS) in the U.S., and the international Research Data Alliance (RDA). It also means software products produced by RENCI-led initiatives, such as the NSF-funded Water Science Software Institute, could be linked into the CoopEUS infrastructure.

It’s good to know there are organizations that understand the importance of data to 21st-century science and that value cooperation over competition. CoopEUS is only one of them, but the strategies it develops to enable data-driven environmental science could likely be applied to other fields of research, as well as to business, government and more.

So, here’s a shout out to those on both sides of the pond who helped create CoopEUS. It’s this kind of collaborative, innovative approach that will help us transform data from a whole lotta ones and zeros into a tool that moves science and society forward.

-Karen Green

Tagged , |

Codefest to focus on collaboration and results

Here’s a concept: a conference where people get real work done.

I’m not attempting to be snarky or to criticize the many worthwhile conferences offered to professionals every year; I’m simply paraphrasing a statement on the Open Science Codefest website. OSCodefest will bring together scientists and programmers involved in developing scientific software who think their work could benefit from collaboration with other researchers, software engineers and developers. Read more…

In science, software matters

Outer Banks flooding from Hurricane Irene (2011). Modeling software used to understand high-impact events will benefit from software development best practices.

Outer Banks flooding from Hurricane Irene (2011). Modeling software used to understand high-impact events will benefit from software development best practices.

In the 21st century, it’s impossible to separate science from the software scientists use to collect data, run computer models and analyze model outputs.

Several RENCI experts make the case for sustainable software development practices in scientific research in two articles recently published in the Journal of Open Research Software.  In the first article, written by RENCI Senior Scientist and Oceanographer Brian Blanton and Chris Lenhardt, domain scientist for environmental data sciences and systems, the authors point out that developing scientific software that is sustainable, accessible, and transparent is especially important when policy decisions and public safety are at stake. Read more…

Coming soon: A “Facebook for hydrologists”

After two years of work, HydroShare, a “Facebook for hydrologists,” will go live in July as an open source website. The HydroShare research team, which includes collaborators from RENCI, Brigham Young University, the University of South Carolina, Purdue University, Tufts University, the University of Texas at Austin, and the University of California at San Diego, received a $4.5 million grant from the National Science Foundation in July 2012 for this project.

HydroShare will give hydrologists the technology infrastructure they need to address critical issues related to water quality, accessibility, and management. The open website will be similar to YouTube in that it will allow users to simply drag and drop their models and data to upload to the site, and similar to Amazon, with a message board and rating system. It will provide an online collaborative environment for discovering, accessing and sharing water science research. Read more…

Big data innovation explodes at RENCI

Claire McPherson of Deloitte closes out a great Innovation Summit.
Claire McPherson of Deloitte closes out a great Innovation Summit.

The creative vibes were buzzing at the First NCDS Data Innovation Showcase, held at RENCI on Wednesday, May 21, 2014. The Showcase, centered around innovative strategies for the rapidly-expanding big data field, brought together NCDS members, faculty and students to share data-related projects, activities and ideas.

The Showcase began with a student poster session during breakfast, where students from NCDS academic institutions showcased their submitted posters on data-related projects and networked with industry and university professionals. The group then gathered to hear short presentations from each of the NCDS Data Science Faculty Fellows on the background, goals, and intended results of their research. Read more…

Girl Power in the Programming World

Though their numbers are expanding, women are still the minority in the computer development and website coding community, which remains about 90 percent male. A group of women in the Triangle, Girl Develop It (GDI), formed to provide a comfortable environment where women can learn about coding, website development and new technology and strategies at their own pace. As an institution that believes in diversifying STEM career fields and fully supports broadening the coding community, RENCI recently opened its doors to GDI to provide a space to host its “Python for Beginners” course.

The course, held at RENCI’s Europa Center headquarters on Saturday, April 26, consisted of an all-day workshop on Python, a software program that powers some of the most popular websites and apps, including Pinterest and Instagram. It can be used to build websites, program robots, visualize data and run servers. Read more…

Creating the universe in a Social Computing Room

photoStudents in a communications class at UNC Chapel Hill have been using the recently opened Social Computing Room (SCR) in the Odum Institute to create their own universe.

The SCR provides an immersive 360-degree view of any visual content, allowing users to interact with and explore data in groups. The original SCR, built in 2007, is located in the RENCI space in ITS Manning on the UNC campus. Similar versions based off the original have recently opened at NC State University and in Odum’s offices in Davis Library. RENCI assisted in the technical design and implementation of the room, helping to install all the hardware, baseline operating system, and projectors and supported part of the cost of outfitting the room. Read more…

Hacking for the public good

It’s one thing to say that the explosion of digital data can be used for public good. It’s quite another to sit down in a room for four hours, access open data files and create an application with practical uses.


View of the Wake County Parks Finder app created using the TerraHub platform.

That’s what Jeff Heard did when he attended the Triangle Open Data Day (TODD) a few weeks ago. TODD, organized by, was a daylong event to promote open data—the idea that digital information from sources such as government agencies should be easy to access and that if it is, smart, innovative techies will use it in interesting ways.

Heard, a RENCI senior research software developer and co-founder of the startup TerraHub, is one such smart, innovative techie. He attended TODD’s hackathon activity with the goal of taking a public data set and creating an application that could transform that raw data into useful knowledge.  And he succeeded. Read more…

The Human Impact of Genetic Research

Jim Evans

Jim Evans of the UNC School of Medicine leads the NCGENES research team.

It’s been about a year and a half since I sat down with Jim Evans, MD, PhD, and Bryson Professor of Genetics and Medicine at the UNC School of Medicine, to learn about NCGENES, a research project to develop processes and a supporting cyberinfrastructure that will allow researchers, clinicians and patients to take full advantage of whole genome and whole exome sequencing. The project is funded by the National Human Genome Research Institute, one of the National Institutes of Health.

The patients enrolled in NCGENES – about 750 of them over four years – have undiagnosed conditions that likely have genetic causes. Through sequencing and an innovative, streamlined analysis process, the researchers hope to find genetic markers for their conditions, make diagnoses, and if possible, treat them. Read more…