White Papers

Table Of Contents

Dell HPC NFS Storage Solution - High Availability Configurations

Page 41

10) For InfiniBand clusters, copy the ibstat-script.sh file provided in Section A.11 to /root/ibstat-

script.sh on both servers. This script checks the InfiniBand link status using the ibstat

command. It is included as a resource of the cluster service to ensure that InfiniBand link is

monitored. (For 10 Gigabit Ethernet clusters, RHCS monitors the 10GbE link and no additional

scripts are needed).

11) Copy the sas_path_check_script.sh file provided in Section A.11 to

/root/sas_path_check_script.sh on both servers. This script checks the status of a device

on the shared storage. If the device is not accessible within a set period of time, the server will

reboot prompting a failover of the cluster service to the other server. The reboot will trigger if the

cluster service is unable to stop gracefully because of the failed path.

The device to check (/dev/mapper/mpath2) and the timeout period (300 second) are tunable.

Check that the device in the script /dev/mapper/mpath2 points to a LUN on the shared MD3200

storage. This can be checked by looking at the output of the multipath –ll command.

12) Check that the public interface is up on both servers. This is the 10GbE link or the InfiniBand link.

13) ONLY on the active server, modify the /etc/cluster/cluster.conf

 Using the example cluster.conf file provided in Section A.11, update the

/etc/cluster/cluster.conf file on the active server. Add in the sections for fence

devices, resources, services, etc.

 Be sure to edit the /etc/cluster/cluster.conf file to match the unique hardware

and software configuration.

This is a critical step. It defines the resources and the cluster service for the NSS-HA

cluster. An incorrect or incomplete cluster.conf file will yield a non-working cluster.

Pay attention to the following environment specific parameters. The instructions below

include xml snippets to help identify the relevant portions of the xml file.

a) Cluster name

Make this change in two places in the xml file.

<cluster alias="NSS_HA_CLUSTER" config_version="2"

name="NSS_HA_CLUSTER">

b) correct hostnames for both R710s (active, passive)

Make this change in two places in the xml file for each host.

<failoverdomainnode name="active" priority="1"/

c) DRAC IP address for both servers