Resources

RENCI has a 24,000+ square foot facility including a 2,000 square foot data center at 100 Europa Drive, Chapel Hill.

  • 2,000 square feet of floor space on an 18 inch raised floor
  • 600 kVA Commercial power
  • 375 kVA UPS power
  • 20 kVA Generator power
  • 134 Tons dedicated cooling
  • Room for 40 Racks of High Performance Computing, Storage, and Networking

RENCI began operations as an organization starting in 2004. Since that time, the organization has acquired various computational systems to support projects and activities. The following is a list of the major computational infrastructure currently active at RENCI.

HPC (Hatteras)

Hatteras is a 5120-core cluster running CentOS Linux and the SLURM resource manager.  Hatteras is segmented into several independent sub-clusters with varying architectures.  It is capable of concurrently running nine 512-way parallel jobs.  Hatteras uses Dell’s densest blade enclosure to allow for maximum core-count within each chassis.  Hatteras’ sub-clusters have the following configurations:

  • Chassis 0-3 (512 interconnected cores per chassis)
    • 32 x Dell M420 quarter-height blade server
      • Two Intel Xeon E5-2450 CPUs (2.1GHz, 8-core)
      • 96GB 1600MHz RAM
      • 50GB SSD for local I/O
    • 40Gb/s Mellanox FDR-10 Interconnect
  • Chassis 4-7 (640 interconnected cores per chassis)
    • 32 x Dell M420 Quarter-Height Blade Server
      • Two Intel Xeon E5-2470v2 CPUs (2.4GHz, 10-core)
      • 96GB 1600MHz RAM
      • 50GB SSD for local I/O
    • 40Gb/s Mellanox FDR-10 Interconnect
  • Chassis 8 (512 interconnected cores)
    • 16 x Dell M630 Half-Height Blade Server
      • Two Intel Xeon E5-2683v4 CPUs (2.1GHz, 16-core)
      • 256GB 2400MHz RAM
      • 100GB SSD for local I/O
  • Large Memory Nodes (40 cores per node)
    • 2 x Dell R820 2U Rack Server (LargeMem)
      • Four Intel Xeon E5-4640v2 processors (40 cores total @ 2.2GHz)
      • 5TB LRDIMM RAM @ 1600MHz
      • 6 Terabytes (8 x 1.2TB) of raw local disk dedicated to the node
      • 10Gb/s Dedicated Ethernet NAS Connectivity
    • 56Gb/s Mellanox FDR Infiniband Interconnect
    • 40Gb/s Mellanox Ethernet Interconnect

Kubernetes

There are several Kubernetes clusters in use by RENCI for development and production workloads. Currently two are deployed on premises and several project specific ones are hosted in the cloud on Google Kubernetes Engine (GKE). We configure our GKE clusters take advantage of several fully automated features to provide a wide variety of resources to the application. These features include:

  • Node Types: compute vs. memory optimized, GPU capable, local storage
  • Auto Scaling: scales number of nodes based on resource utilization (CPU, memory, GPU)
  • Node Pools: specific node pools are utilized depending on resource need
  • Node Repair: auto repair process is initiated for un-healthy nodes

For our local infrastructure, listed below, we utilize scripts to automate the deployment of clusters and applications.

  • Mitchell, on-prem production cluster with 1 TB of persistent storage
    • Nodes
      • 4 x VMs with 32 GB RAM, 4 vCPUs
      • Dell PowerEdge R740 server (Harpo)
        • 2 x Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz (10 cores)
        • 576 GB RAM
        • 3 TB local SSD storage
        • NVIDIA Tesla v100 GPU
      • Dell PowerEdge R840 server (Arrival)
        • 4 x Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz (20 cores)
        • 512 GB RAM
        • 9 TB local SSD storage
  • Blackbalsam, on-prem development cluster with 500 GB of persistent storage
      • Nodes
        • 15 x VMs with 32GB RAM, 4 vCPUs

GPUs

Standalone servers with GPUs (Graphical Processing Units) are available. Currently there are four Nvidia Tesla V100 cards and two Nvidia Tesla P100 cards.

In addition to these systems, a Microsoft-sponsored research project has been leveraged to deploy a comprehensive environment supporting both research and general RENCI operations. The Microsoft-sponsored research project encourages the use of leading-edge technologies and the services of the entire Microsoft enterprise IT platform. The systems currently deployed include:

  • 5 TB SQL Server available to the research community
  • 36 TB SQL Server dedicated to a specific genetics research project

In addition to these capabilities, the deployed Microsoft based enterprise IT services provide a complete platform for research and operations, including: directory services (AD); federated identity (ADFS); inventory, monitoring and configuration management (System Center); source control and SDLC (Visual Studio Team Foundation Server); generally available Windows login node (Remote Desktop Services).

The RENCI Storage Infrastructure includes:

  • NetApp Clustered Data ONTAP
    • FAS8020 node HA Pair
      • 4PB Raw; 360 x 4TB 7.2kRPM Disks
      • 2TB FlashCache
    • AFF-A300 node HA Pair
      • 218TB Raw; 136 x 1.6TB SSD
    • FAS8060 node HA Pair
      • 3PB Raw; 828 x 4TB 7.2kRPM Disks
      • 8TB FlashCache
  • Isilon OneFS Cluster
    • NL410 node (six nodes)
      • 2PB Raw
      • 48GB RAM
      • 800GB SSD for R/W cache
    • A2000 node (twelve nodes)
      • 3PB Raw
      • 16GB RAM
      • 800GB SSD for R/W cache

The RENCI production network connects to the North Carolina Research and Education Network (NCREN) and the University of North Carolina’s campus network. NCREN provides connectivity to Internet2 Layer 3 service at 100Gbps, RENCI shares a 100G interface on Al2S for badnwdith-on-demand applications with other Triangle campuses (Duke, NCSU).

RENCI’s production connectivity to the outside world at 20Gbps is supported through pair of Arista routers managed by UNC ITS (Information Technology Services) and RENCI staff. Connectivity into the datacenter is facilitated through a mix of switches managed by UNC ITS and RENCI. RENCI’s internal datacenter network infrastructure is supported by two Arista switches configured in a Multi-Chassis Link Aggregation (MLAG) capable of supporting 10/25/100Gb/s connections. This allows RENCI to cleanly separate production, research and experimental networking infrastructures such that they can coexist without interfering with each other.

RENCI datacenter hosts a deployment of perfSonar servers (ps1.renci.org and ps2.renci.org) as well as a Bro IDS processing traffic at line rate.

The Layer 2 Breakable Experimental Network (BEN; http://ben.renci.org) is the primary platform for RENCI experimental network research. It consists of several segments of NCNI dark fiber across the Triangle area of NC, a time-shared resource that’s available to the Triangle research community. BEN is a research facility created for researchers and scientists in order to promote scientific discovery by providing the Universities with world-class infrastructure and resources for experimentation with disruptive technologies. BEN provides non-production network connectivity between RENCI, UNC main campus, Duke and NCSU. BEN PoPs (Points of Presence) are distributed across the Triangle metro region that form a research test bed. RENCI acts as a caretaker of the facility as well as a participant in the experimentation activities on BEN.

On BEN RENCI has deployed a number of Corsa virtualizable OpenFlow switches supporting connectivity at 10Gbps across sites. Each site is also equipped with a load-generating server to support performance measurements.

BEN can also act as a dark-fiber testbed, as the Corsa switches can be disconnected from the fiber to be replaced by other types of equipment, as needed by the research community.

GENI Infrastructure

Through a project named ExoGENI (http://www.exogeni.net) funded by the NSF through the GENI Project Office, RENCI has deployed 12 ‘GENI Racks’ across the US university campuses, with each rack consisting of 10 IBM x3650 M4 worker nodes controlled through a head node, a 6TB iSCSI storage array and a 10Gbps/40Gbps BNT 8264 OpenFlow switch. A number of racks by other vendors were ‘opted-into’ the system at the request of owner campuses (NICTA, Sydney, Australia, UvA, Amsterdam, The Netherlands, GWU, Washington DC, WVnet, WV, Ciena labs, Ottawa, CA and Hanover MD, UNF, Jacksonville FL and PUCP, Lima, Peru and others).

These ‘ExoGENI’ racks constitute a ‘networked cloud’ prototype infrastructure running OpenStack and xCAT software intended for GENI experimenters but also suitable for distributed computation experiments in various domain sciences. The scheduling and orchestration of resources and the GENI programming interface for ExoGENI are provided by the ORCA (Open Resource Control Architecture) software designed jointly by RENCI and Duke University.

Racks are connected to the public Internet for management access.  In addition they are connected to Internet2 ION and AL2S Layer2 services, as well as ESnet Layer 2 OSCARS service through a number of regional providers (LEARN, MCNC, CENIC, MERIT, OARnet). These provide connections between racks, as well as with other elements of GENI infrastructure and non-GENI resources on various campuses.

More information at http://www.exogeni.net

Visualization Infrastructure

The visualization component of RENCI is composed of conference rooms with video capability.

Europa Center

Videoconferencing:  There are seven videoconferencing rooms at Europa: one with five projectors, two with three, and four with one projector or large LCD display.

Virtualization Infrastructure

RENCI has two VMware vSphere Enterprise clusters that service the needs of most projects.

  • Europa Center Cluster (located onsite at RENCI in the Europa datacenter)
    • 5 x Dell PowerEdge R740
      • 2 x 2.1GHz Intel Xeon Gold 6252 CPU (48 cores total)
      • 5 TB System memory
      • 4 x 10 GbE Network connections
  • ITS Manning Cluster (located on campus in the ITS Manning datacenter)
    • 3 x Dell PowerEdge R740
      • 2 x 3.0 GHz Intel Xeon E5-2690 Processors (20 cores total)
      • 256 GB System memory
      • 4 x 10 GbE Network connections