White Papers
14 Dell HPC NFS Storage Solution – High Availability (NSS7.0-HA) Configuration
Failure type
Mechanism to handle failure
Power supply or power bus
failure
Dual PSUs in each server. Each PSU is connected to a
separate power bus. Server continues functioning with
a single PSU.
Fence device failure
iDRAC8 Enterprise used as primary fence device.
Switched PDUs used as secondary fence devices.
SAS cable/port failure
Two SAS cards in each NFS server. Each card has a
SAS cable to each controller in the shared storage. A
single SAS card or cable failure will not impact data
availability.
Dual SAS cable/card failure
Monitored by the cluster service. If all data paths to the
shared storage are lost, service fails over to the
passive server.
OPA / InfiniBand / 10GbE link
failure
Monitored by the cluster service. Service fails over to
passive server.
Private switch failure
Cluster service continues on the active server. If there
is an additional component failure, service is stopped
and system administrator intervention required.
Heartbeat network interface
failure
Monitored by the cluster service. Service fails over to
passive server.
RAID controller failure on Dell
PowerVault MD3460 storage
array
Dual controllers in the Dell PowerVault MD3460. The
second controller handles all data requests.
Performance may be degraded, but functionality is not
impacted.
4.3.1.2 HA tests for NSS-HA
Functionality was verified for an NFSv4-based solution. The following failures were simulated on the
cluster with the consideration of the failures and faults listed Table 6.
Server failure
Heartbeat link failure
Public link failure
Private switch failure
Fence device failure
Single SAS link failure
Multiple SAS link failures
The NSS-HA behaviors in response to these failures are:
Server failure: Simulated by introducing a Kernel panic. When the active server stops functioning, the
heartbeat between the two servers is interrupted. The passive server waits for a defined period of time,
and then attempts to fence the active server. After fencing is successful, the passive server takes