White Papers
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th
Generation Servers
8
2) Fencing – In the HA cluster, once a node notices that the other node has failed, it fences (reboot)
the failed node using a fence device. This is to make sure that only one server accesses the data at
any point to protect data integrity. In NSS-HA, a node can fence the other using the Dell iDRAC or
an APC PDU. The fence devices and corresponding fence commands are configured as part of the
HA cluster configuration process. In this case, NFS server 2 fences NFS server 1.
3) Service failover – In the HA cluster, only after a node successfully fences the other can the service
failover process start. Failover means that the HA service running previously on the failed server is
transferred to the healthy one. In this case, once NFS server 2 has successfully fenced server 1, the
HA service is transferred to and started on NFS server 2.
A failure scenario in NSS-HA Figure 2.
From the perspective of the HA cluster, degradation in performance occurs during the actual HA
failover process. But the failover is transparent to the cluster as far as possible and user applications
continue to function and access data as before.
The HA service is defined and configured in the cluster configuration process. In the NSS-HA, NFS
export, the service IP on which the compute nodes access the NFS server, and logical volume manager
(LVM) are configured as a HA service.
2.2. NSS-HA offerings from Dell
The current Dell NSS-HA solution continues the evolutionary growth of the Dell NFS Storage Solution HA
family, the NSS-HA. The first NSS-HA solution for HPC from Dell used the Dell PowerEdge R710 servers,
the Dell PowerVault MD3200 RAID array and the Dell PowerVault MD1200 expansion chassis. At the
time, 2-TB drives provided the best value in terms of capacity for each dollar. The second version of
NSS-HA used the same Dell PowerEdge R710 servers, but integrated 3-TB hard drives for improved
capacity and broke the 100-TB supported capacity limit for Red Hat-based scalable file system.
With the introduction of the third version of NSS-HA (described in this document), Dell is upgrading the
servers to a smaller form factor Dell PowerEdge R620 server, which features the Intel E5-2600
processors (codenamed Sandy Bridge EP). The 1U Dell PowerEdge R620 server, with the integrated
PCIe Gen-3 I/O capabilities of the Intel E5-2600 processor, allows for a faster interconnect using the