White Papers

Dell HPC NFS Storage Solution - High Availability (NSS-HA) Configuration with Dell PowerVault
MD3260/MD3060e Storage Arrays
6
1. Introduction
This solution guide provides information on the latest Dell NFS Storage Solution - High Availability
configurations (NSS-HA) with Dell PowerVault MD3260 and MD3060e storage arrays. The solution uses
Dell PowerEdge servers and PowerVault storage arrays along with Red Hat high Availability software
stack to provide an easy to manage, reliable, and cost effective storage solution for HPC clusters. It
leverages the latest Dell PowerVault Storage arrays (MD3260 and MD3060e) to offer denser storage
solutions than previous NSS-HA solutions. This version of the solution is NSS4.5-HA.
The design principle for this release remains the same as previous Dell NSS-HA solutions. The major
changes between the current and previous version of NSS-HA solution are the change from Dell
PowerVault MD3200 and MD1200 storage arrays to the latest PowerVault MD3260 and MD3060e storage
arrays, and the change from the RHEL 6.1 operating system to RHEL 6.3. For complete details, review
this document along with the previous NSS-HA white papers
(1) (2) (3)
.
The following sections describe the technical details, evaluation method, and the expected
performance of the solution.
2. Overview of NSS-HA solutions
Along with the current version, four versions of NSS-HA solutions have been released since 2011. This
section provides a brief description of the NSS-HA solution, and lists the available Dell NSS-HA
offerings.
2.1. A brief introduction to NSS-HA solutions
The design of the NSS-HA solution for each version is identical. In general, the core of the solution is a
high availability (HA) cluster
(4)
, which provides a highly reliable and available storage service to HPC
compute clusters via a high performance network connection such as InfiniBand (IB) or 10 Gigabit
Ethernet (10GbE).
The HA cluster consists of a pair of Dell PowerEdge servers and a network switch. The two PowerEdge
servers have shared access to disk-based Dell PowerVault storage in a variety of capacities, and both
are directly connected to an HPC cluster via IB or 10GbE. The two servers are equipped with two fence
devices: iDRAC7 Enterprise and an APC Power Distribution Unit (PDU). When failures such as storage
disconnection, network disconnection, system crash, etc., occurs on one server, the HA cluster will
failover the storage service from the failed server to the healthy server with the assistance of the two
fence devices, which also ensure that the failed server does not return to life without the
administrators knowledge or control.
The disk-based storage array is formatted as a Red Hat Scalable file system (XFS) and exported to the
HPC cluster via NFS service of the HA cluster. Large capacity file systems (greater than 100TB) have
been supported since the 2
nd
version of NSS-HA solution
(2)
.
Figure 1 depicts the general infrastructure of the NSS-HA solution. For detailed information, refer to
the previous NSS-HA white papers
(1) (2) (3)
.