White Papers

Dell HPC NFS Storage Solution - High Availability (NSS-HA) Configuration with Dell PowerVault
MD3260/MD3060e Storage Arrays
15
tolerant mechanisms in NSS-HA solutions, then presents the HA functionality tests with regards to
different potential failures and faults.
4.3.1. Potential failures and fault tolerant mechanisms in NSS-HA
There are many different types of failures and faults that can impact the functionality of NSS-HA.
Table 7 lists the potential failures that are tolerated in NSS-HA solutions.
Note: The analysis below assumes that the HA cluster service is running on the active server; the
passive server is the other component of the cluster.
NSS-HA mechanisms to handle failures Table 7.
Failure type
Mechanism to handle failure
Single local disk failure on a
server
Operating system installed on a two-disk RAID 1
device with one hot spare. Single disk failure is
unlikely to bring down server.
Single server failure
Monitored by the cluster service. Service fails over to
passive server.
Power supply or power bus
failure
Dual power supplies in each server. Each power
supply connected to a separate power bus. Server
continues functioning with a single power supply.
Fence device failure
iDRAC7 Enterprise used as primary fence device.
Switched PDUs used as secondary fence devices.
SAS cable/port failure
Two SAS cards in each NFS server. Each card has a SAS
cable to the shared storage. A single SAS card/cable
failure will not impact data availability.
Dual SAS cable/card failure
Monitored by the cluster service. If all data paths to
the shared storage are lost, service fails over to the
passive server.
InfiniBand / 10GbE link failure
Monitored by the cluster service. Service fails over to
passive server.
Private switch failure
Cluster service continues on the active server. If
there is an additional component failure, service is
stopped and system administrator intervention
required.
Heartbeat network interface
failure
Monitored by the cluster service. Service fails over to
passive server.
RAID controller failure on Dell
PowerVault MD3260 storage
array
Dual controllers in the Dell PowerVault MD3260. The
second controller handles all data requests.
Performance may be degraded, but functionality is
not impacted.
4.3.2. HA tests for NSS-HA
Functionality was verified for an NFSv3-based solution. The following failures were simulated on the
cluster with the consideration of the failures and faults listed Table 7.