White Papers

13 Dell HPC NFS Storage Solution High Availability (NSS7.0-HA) Configuration
NSS7.0-HA client cluster configuration
Client / HPC Compute Cluster
Clients
32 PowerEdge R630s
Each compute node has:
CPU: Dual Intel Xeon E5-2697 v4 @ 2.3GHz, 18 cores per
processor
Memory: 16 x 8GiB 2400 MT/s
Red Hat Enterprise Linux 7.1, kernel 3.10.0-229.el7.x86_64
HCA card
Intel OPA HFI
Switch
Intel Omnipath Fabric Edge Switch
4.3 HA functionality
The HA functionality of the solution was tested by simulating several component failures. The design of
the tests and the test results are similar to previous versions of the solution because the general
architecture of the solution has not changed in this release. This section reviews the failures and fault-
tolerant mechanisms in NSS-HA solutions, and then presents the HA functionality tests with regards to
different potential failures and faults.
4.3.1.1 Potential failures and fault tolerant mechanisms in NSS-HA
There are many different types of failures and faults that can impact the functionality of NSS-HA. Table 6
lists the potential failures that are tolerated in NSS-HA solutions.
Note: The analysis below assumes that the HA cluster service is running on the active server; the passive
server is the other component of the cluster.
NSS-HA mechanisms to handle failures
Failure type
Mechanism to handle failure
Single local disk failure on a
server
Operating system installed on a two-disk RAID1 device
with one hot-spare. Single disk failure is unlikely to
make server non-functioning.
Single server failure
Monitored by the cluster service. Service fails over to
passive server.