White Papers

Dell HPC NFS Storage Solution - High Availability (NSS6.0-HA) Configuration with Dell PowerEdge 13
th
Generation Servers
13
NSS6.0-HA client cluster configuration
Table 5.
Client / HPC Compute Cluster
Clients
64 PowerEdge M420 blade servers
32 blades in each of two PowerEdge M1000e chassis
Red Hat Enterprise Linux 6.4 x86-64.
Chassis
configuration
Two PowerEdge M1000e chassis, each with 32 blades
Two Mellanox M4001F FDR10 I/O modules per chassis
Two PowerConnect M6220 I/O switch modules per chassis
InfiniBand
Each blade server has one Mellanox ConnectX-3 Dual-port FDR10
Mezzanine I/O Card
Mellanox OFED 2.3-1.0.1
InfiniBand
fabric for I/O
traffic
Each PowerEdge M1000e chassis has two Mellanox M4001 FDR10 I/O
module switches.
Each FDR10 I/O module has four uplinks to a rack Mellanox SX6025
FDR switch for a total of 16 uplinks.
The FDR rack switch has a single FDR link to the NFS server.
Ethernet
Each blade server has one onboard 10GbE Broadcom 57810 network
adapter.
Ethernet
fabric for
cluster
deployment
and
management
Each PowerEdge M1000e chassis has two PowerConnect M6220
Ethernet switch modules.
Each M6220 switch module has one link to a rack PowerConnect
5224 switch.
There is one link from the rack PowerConnect switch to an Ethernet
interface on the cluster master node.
4.3. HA functionality
The HA functionality of the solution was tested by simulating several component failures. The design of
the tests and the test results are similar to previous versions of the solution since the general
architecture of the solution has not changed in this release. This section reviews the failures and fault
tolerant mechanisms in NSS-HA solutions, then presents the HA functionality tests with regards to
different potential failures and faults.
4.3.1. Potential failures and fault tolerant mechanisms in NSS-HA
There are many different types of failures and faults that can impact the functionality of NSS-HA.
Table 6 lists the potential failures that are tolerated in NSS-HA solutions.