White Papers
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th
Generation Servers
7
Fence devices – Fence devices are required for fencing (rebooting) the failed or misbehaving
cluster node in the HA cluster. In the NSS-HA solution, two types of fence devices are configured:
Switched Power Distribution Units (PDU) and the Dell server management controller, the iDRAC.
The infrastructure of the NSS-HA solution Figure 1.
A major goal of the NSS-HA solution is to improve storage service availability in the presence of
possible failures or faults. This goal is achieved by a failover process implemented by Red Hat
Enterprise High Availability Cluster software stack. The failover process is divided into three stages:
failure detection, fencing, and service failover.
Figure 2 shows a typical scenario of how storage service availability is guaranteed in the NSS-HA
solution. In this scenario, a kernel crash occurs on an NFS server (the active one), which is the NFS
gateway for the HA cluster.
1) Failure detection – Resources related to the storage service, such as file system, service IP address,
and so on, are defined, configured and monitored for health by the HA cluster. Any interruption in
access to the storage is detected. In this case, once a kernel crash occurs at NFS server 1 (the
active one), a message in terms of loss of heartbeat signal is passed to the NFS server 2, and server
2 recognizes that the server 1 has failed.