White Papers

Dell HPC NFS Storage Solution High Availability Configurations with Large Capacities
62
A.7. NSS HA cluster setup
In this recipe the term “cluster” refers to the active-passive NSS-HA Red Hat cluster.
A.7.1. Prepare
1. On both R710s install the cluster software packages.
# yum install -y ricci rgmanager cman openais lvm2-cluster ccs
# service ricci start; chkconfig ricci on
2. Set a password for user ricci using the command below
# passwd ricci
3. Create a mount point for the file system.
On both servers, create a mount point for the XFS file system. This is the directory where the XFS
file system will be mounted and that will be exported over NFS to the clients.
# mkdir /mnt/xfs1
If you plan to use the storage for home directories, you also need to configure SElinux to allow it.
# chcon t home_root_t /mnt/xfs1
Similarly, the clients need SElinux to allow NFS home directories.
# setsebool P use_nfs_home_dirs 1
Alternatively, you can use the following command to disable SElinux on the clients, the server or
both. A reboot is needed for the change to take effect.
# sed -i 's/.*SELINUX=.*/SELINUX=disabled/' /etc/sysconfig/selinux
4. For InfiniBand clusters, copy the ibstat_script.sh file provided in Section A.11 to
/root/ibstat_script.sh on both servers. This script checks the InfiniBand link status using the
ibstat command. It is included as a resource of the cluster service to ensure that InfiniBand link
is monitored. (For 10 Gigabit Ethernet clusters, RHCS monitors the 10GbE link and no additional
scripts are needed).
5. Copy the sas_path_check.sh file provided in Section A.11 to /root/sas_path_check.sh on
both servers as shown below. This script checks the status of a device on the shared storage. If the
device is not accessible within a set period of time, the server will reboot prompting a failover of
the cluster service to the other server. The reboot will trigger if the cluster service is unable to
stop gracefully because of a failed SAS path.
The device to check (i.e. /dev/mapper/mpatha) and the timeout period (300 second) are
tunable. Make sure that the device in the script, e.g., /dev/mapper/mpatha points to a LUN on
the shared MD3200 storage, otherwise, please modify the device name. This can be checked by
looking at the output of the multipath ll command.
# cp sas_path_check.sh /root/sas_path_check_1.sh