HP Serviceguard Extended Distance Cluster for Linux A.12.00.00 Deployment Guide, March 2014

ManualsBrandsHP ManualsSoftwareHP SAP Linux Serviceguard Cluster Extension

3.3.1.1 Setting the Value of the Link Down Timeout Parameter

After you install, you must set the Link Down Timeout parameter for the Fibre Channel cards

to a duration equal to the cluster reformation time. The value of cluster reformation time parameter

depends on the heartbeat interval and the node timeout values configured in a particular cluster.

To get the cluster reformation time set for a particular cluster:

cmviewcl –vf line | grep max_reformation_duration

This parameter prevents any data being written to a disk when a failure occurs. The value of this

parameter must be set such that the disks are inaccessible for a time period which is greater than

the cluster reformation time. This parameter is important in scenarios where an entire site is in the

process of going down. By blocking further writes to the MD device, the two disks of the MD device

remain current and synchronized. As a result, when the package fails over, it starts with a disk

that has current data.

The Fibre Channel cards are configured to hold up any disk access and essentially hang for a

time period which is greater than the cluster reformation time when access to a disk is lost. This is

achieved by altering the Link Down Timeout value for each port of the card. Setting a value

for the Link Down Timeout parameter for a Fibre Channel card ensures that the MD device

hangs when access to a mirror is lost. However, the MD device resumes activity when the specified

hang period expires. This ensures that no data is lost.

This parameter is required to address a scenario where an entire datacenter fails but all its

components do not fail at the same time but undergo a rolling failure. In this case, if the access to

one disk is lost, the MD layer hangs and data is no longer written to it. Within the hang period,

the node goes down and a cluster reformation takes place. When the package fails over to another

node, it starts with a disk that has current data.

For example, consider a scenario where the storage link between the two arrays goes down when

time “T” value is set.

If the Link Down Timeout parameter is set, ensure that the MD device hangs for the time

equivalent to “time out” value that is being set and all the IO’s are held at the FC card as follows:

• If the entire site fails, the node goes down. Due to this, the packages are started on the

secondary site with the available data.

• If the entire site does not fail, the IO hangs for the time equivalent to “time out” value that is

being set. Due to this, the primary node is still active even after the time expires and an MD

device resumes activity.

3.3.1.2 Configuring Single Path to Storage

For Fibre Channel cards, you can configure the timeout parameter using the Link Down Timeout

parameter. Link Down Timeout parameter, in turn can be configured using SANSurfer CLI tool

on Red Hat Enterprise Linux 5 and 6 or SUSE Linux Enterprise Server 11 SP1 and using

QConvergeConsoleCLI tool on SUSE Linux Enterprise Server SP2.

For Emulex cards, you can configure the timeout parameter using the Link Down Timeout

parameter that can be configured using HBAAnywhere tool.

3.3.1.3 Configuring Multiple Paths to Storage

In addition to configuring the Link Down Timeout parameter for each card as described in the

previous section, you need to configure the dev_loss_tmo parameter in the /etc/

multipath.conf file to a value greater than the cluster reformation time.

3.3.2 Using Persistent Device Names

Once the storage is configured, you need to create persistent device names using udev. In cases

of disk or link failure and subsequent reboot, it is possible that device names are renamed or

reoriented. Since the MD mirror device starts with the names of the component devices, a change

22 Configuring your Environment for Software RAID