White Papers
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th
Generation Servers
20
In the real world, there are many different types of failures and faults that can impact the
functionality of NSS-HA. Table 10 lists the potential failures that are tolerated in NSS-HA solutions.
Note: The analysis below assumes that the HA cluster service is running on the active server; the
passive server is the other component of the cluster.
NSS-HA mechanisms to handle failures Table 10.
Failure type
Mechanism to handle failure
Single local disk failure on a
server
Operating system installed on a two-disk RAID 1
device with one hot spare. Single disk failure is
unlikely to bring down server.
Single server failure
Monitored by the cluster service. Service fails over to
passive server.
Power supply or power bus
failure
Dual power supplies in each server. Each power
supply connected to a separate power bus. Server
continues functioning with a single power supply.
Fence device failure
iDRAC used as primary fence device. Switched PDUs
used as secondary fence devices.
SAS cable/port failure
Two SAS cards in each NFS server. Each card has a SAS
cable to storage. A single SAS card/cable failure will
not impact data availability.
Dual SAS cable/card failure
Monitored by the cluster service. If all data paths to
the storage are lost, service fails over to the passive
server.
InfiniBand /10GbE link failure
Monitored by the cluster service. Service fails over to
passive server.
Private switch failure
Cluster service continues on the active server. If
there is an additional component failure, service is
stopped and system administrator intervention
required.
Heartbeat network interface
failure
Monitored by the cluster service. Service fails over to
passive server.
RAID controller failure on Dell
PowerVault MD3200 storage
array
Dual controllers in the Dell PowerVault MD3200. The
second controller handles all data requests.
Performance may be degraded but functionality is not
impacted.
4.3.2. HA tests for NSS-HA
Functionality was verified for an NFSv3 based solution. The following failures were simulated on the
cluster with the consideration of the failures and faults listed Table 10.
Server failure
Heartbeat link failure
Public link failure