White Papers

Table Of Contents
Dell HPC NFS Storage Solution - High Availability Configurations
Page 41
10) For InfiniBand clusters, copy the ibstat-script.sh file provided in Section A.11 to /root/ibstat-
script.sh on both servers. This script checks the InfiniBand link status using the ibstat
command. It is included as a resource of the cluster service to ensure that InfiniBand link is
monitored. (For 10 Gigabit Ethernet clusters, RHCS monitors the 10GbE link and no additional
scripts are needed).
11) Copy the sas_path_check_script.sh file provided in Section A.11 to
/root/sas_path_check_script.sh on both servers. This script checks the status of a device
on the shared storage. If the device is not accessible within a set period of time, the server will
reboot prompting a failover of the cluster service to the other server. The reboot will trigger if the
cluster service is unable to stop gracefully because of the failed path.
The device to check (/dev/mapper/mpath2) and the timeout period (300 second) are tunable.
Check that the device in the script /dev/mapper/mpath2 points to a LUN on the shared MD3200
storage. This can be checked by looking at the output of the multipath ll command.
12) Check that the public interface is up on both servers. This is the 10GbE link or the InfiniBand link.
13) ONLY on the active server, modify the /etc/cluster/cluster.conf
Using the example cluster.conf file provided in Section A.11, update the
/etc/cluster/cluster.conf file on the active server. Add in the sections for fence
devices, resources, services, etc.
Be sure to edit the /etc/cluster/cluster.conf file to match the unique hardware
and software configuration.
This is a critical step. It defines the resources and the cluster service for the NSS-HA
cluster. An incorrect or incomplete cluster.conf file will yield a non-working cluster.
Pay attention to the following environment specific parameters. The instructions below
include xml snippets to help identify the relevant portions of the xml file.
a) Cluster name
Make this change in two places in the xml file.
<cluster alias="NSS_HA_CLUSTER" config_version="2"
name="NSS_HA_CLUSTER">
b) correct hostnames for both R710s (active, passive)
Make this change in two places in the xml file for each host.
<clusternode name="active" nodeid="1" votes="1">
<clusternode name="passive" nodeid="2" votes="1">
<failoverdomainnode name="active" priority="1"/
<failoverdomainnode name="passive" priority="2"/>
c) DRAC IP address for both servers