Using NFS as a filesystem type with HP Serviceguard A.11.20 on HP-UX 11i v3, September 2010

• So that Serviceguard can ensure that all I/O from a node on which a package has failed is flushed

before the package starts on an adoptive node, all the switches and routers between the NFS server

and client must support a worst-case timeout, after which packets and frames are dropped.

This timeout is known as the Maximum Bridge Transit Delay (MBTD). Switches and routers that do

not support MBTD must not be used in a Serviceguard configuration. This might lead to delayed

packets that could lead to data corruption.

• Networking among the Serviceguard nodes must be configured in such a way that a single failure

in the network does not cause a package failure.

Setting up the NFS server

See the “NFS Services Administrator’s Guide” for instructions on configuring the NFS server and

shares.

Configuring the NFS package and cluster parameters

Configuring the NFS package parameters

• In the modular package configuration file, the new parameter fs_server specifies the name of the

NFS server. The value of this parameter can be either the hostname of the NFS server or its IP

address (both IPv4 and IPv6 addresses are supported).

The NFS server can be configured on a different subnet or in a different domain than the

Serviceguard cluster.

• fs_types specifies the filesystem type. Set this to “NFS” to use this feature.

• fs_mount_opt specifies the mount option. This must include “-o llock” in addition to any other options

you specify. “-o llock” specifies local locking for the NFS filesystem.

• fs_fsck_opt should not be used. If any option is found in fs_fsck_opt for an NFS-imported filesystem,

a warning will be logged and the value will be ignored.

Configuring the cluster parameter CONFIGURED_IO_TIMEOUT_EXTENSION

In a Serviceguard cluster in which NFS-imported filesystems are used, an unlikely but possible

scenario exists in which data corruption could occur. The scenario is as follows:

1. A Serviceguard package using an NFS filesystem (“NFSPkg”) is running on cluster node “client-1”

2. Node “client-1” issues an NFS write request immediately before NFSPkg moves to another cluster

node.

3. NFSPkg is started on the adoptive node “client-2”

4. Adoptive node “client-2” begins sending NFS write requests to the same file and offset as the write

request previously sent by “client-1” just before the package was moved.

5. If the original NFS write request from "client-1" arrives on the NFS server after the new write

requests from “client-2”, the server would overwrite the data sent from “client-2”, thus resulting in

data corruption.