In its prime, the UNC supercomputer called Topsail had a peak performance of nearly 30 trillion calculations per second—more than enough to earn a spot on the biannual Top500 supercomputers list.
That was in 2007, and as anyone familiar with Moore’s law knows, constant improvements in computing hardware mean a short life for even the most powerful high performance machines.
By 2012, Topsail, housed at UNC’s Information Technology Services in the ITS Manning Building, was destined to become surplus. After six years of helping UNC researchers conduct computationally intensive work in the biosciences, medicine, environmental sciences and other fields, its 530-plus nodes of dual quad-core processors were slow and outdated compared to newer models of parallel computers.
But Topsail found a new lease on life thanks to resourceful computer engineers at RENCI, who gave the aging hardware the geek equivalent of a makeover.
“Topsail was about to be surplused, but the RENCI informatics group had the idea of refurbishing and reconfiguring it as a machine for big data problems,” said Erik Scott, a senior research software developer at RENCI.
Scott and RENCI Systems Specialist Mark Montazer set to work last August transforming Topsail from a supercomputer with separate file servers and processors into one that integrates processing and file serving. That integration, said Scott, makes data input and output (I/O) much faster and creates a machine well suited for handling data-intensive problems.
All parallel supercomputers divide their computing jobs into many mathematical problems and distribute different parts of the problem to the computer’s different nodes. While computationally intensive computing jobs—such as climate modeling, modeling complex biological systems, or astrophysics research—require parallel processing on many different, complex mathematical problems, data-intensive computing generally involves performing the same kind of calculation over and over on the computer’s nodes. Because the datasets are huge, fast I/O is essential to perform the same analysis on long streams of data.
A new home and a new lease on life
Analysis of genomic sequencing data is the perfect example of data-intensive computing. For that reason, Topsail was relocated from ITS Manning to the Genome Sciences Building, where it is easily accessible to researchers studying the relationship between genetic variance and disease. Using the new “Son of Topsail” UNC researchers can now process genomic data sets 200 to 400 times faster than on a high-end desktop computer, according to Scott.
“From a math standpoint, the problem that genomic scientists deal with is very straightforward, but there is a huge amount of data to process,” said Scott. “Between the public genomic databases and the sequencing that’s been done here, UNC has about 35 trillion base pairs (the building blocks of DNA’s double helix) and that number is constantly growing.”
The new Topsail has been used to analyze sequenced genomes in a study of genetic variants that might influence a person’s susceptibility to substance addiction, led by Kirk Wilhelmsen, MD, PhD, RENCI’s chief scientist for genomics, and a professor in the UNC School of Medicine’s neurology and genetics departments. It also enables the North Carolina Clinical Genomic Evaluation by NextGen Exome Sequencing (NCGENES) project. NCGENES, which involves researchers at the School of Medicine and RENCI, aims to develop a system that will quickly processes and analyze patients’ genomic data to determine their risks for genetic diseases and improve clinical treatment. Jim Evans, MD, PhD, and Bryson Professor of Genetics and Medicine at UNC’s School of Medicine, leads the NCGENES project.
The revamped computer also caught the attention of Don Smith, a UNC Computer Science professor who teaches a course on systems designed for big data problems called Data Center Systems and Programming.
Smith needed a computing resource that would allow his students to work with Hadoop, a popular software framework used to support distributed, data-intensive applications. The class had been using a very small cluster, but the chance to use Topsail allows students to work on larger data sets and on problems similar to what they would encounter in professional business and research settings.
“Having access to a cluster of this scale is a great opportunity,” said Smith. “It gives the students the chance to do work that is very similar to something they would do in industry.”
The computer science students will begin using Topsail for their final projects when classes reconvene after Spring Break. Smith said they will be encouraged to tackle big data problems related to their interests, including genomics, medical imaging and Web analytics.