OCFS2 Integration with HP Serviceguard for Linux Administrator's Guide, First Edition, November 2008
Figure 2-3 OCFS2 Disk Heartbeat Timeout Definitions
t0 t1 t2 t3 t4
ODFS2 Disk Heartbeat
times out.
Serviceguard TOC’s
node once disk monitor
service fails.
I/O layer times out.
Disk monitor hangs when it tries
to access the failed device.
Storage links from a node are
lost. I/O to disks are halted for a
duration equal to the I/O timeout.
In the figure, t0, t1, t2, t3, and t4 represent the time at which HP Serviceguard for Linux and
OCFS2 realize that there is a disk access failure. Following is the sequence of events:
1. At t0, a node loses connectivity to a storage device because of an FC link failure. This loss
of connectivity immediately halts the I/O from all nodes for a duration which is equal to the
timeout of the I/O layer.
2. At t1, the Disk Monitor Service that monitors the connectivity to disks hangs.
3. At t2, the I/O layer timeout occurs and the Disk Monitor returns with a failure to access the
disk.
4. At t3, Serviceguard ensures that the faulty node is reset, because of the Disk Monitor Service
failure. The faulty node resets and subsequently rejoins the cluster.
5. At t4, the Disk Heartbeat for OCFS2 times out.
IMPORTANT: In this procedure, the Disk Monitor is not the standard Serviceguard Disk Monitor
using cmresserviced. In this integrated environment, do not configure the cmresserviced Disk
Monitor. When you configure the OCFS2 mount point multi-node package using the ocfs2mntadm
administrative utility, the OCFS2 Disk Monitor Service is automatically configured.
In Figure 2-2, the time period between t0 and t2 is the timeout of the I/O layer of the node. Most
multi-path solutions have a timeout that ranges from 60 seconds to 120 seconds. This timeout is
determined by the driver that is configured in your environment. The recommended value of
the OCFS2 Disk Heartbeat Timeout is 30 seconds greater than the timeout of the I/O layer of the
node.
Use the following formula to determine the required value for the O2CB_HEARTBEAT_THRESHOLD
parameter:
O2CB_HEARTBEAT_THRESHOLD = (((timeout of the I/O layer <in seconds> +
30 seconds) / 2) + 1)
For example:
If the timeout of the I/O layer is 60 seconds, then according to the formula, the
O2CB_HEARTBEAT_THRESHOLD is 46 seconds.
O2CB_HEARTBEAT_THRESHOLD = (((60 seconds + 30 seconds)/2) + 1)
O2CB_HEARTBEAT_THRESHOLD = 46
18 Integrating OCFS2 with HP Serviceguard for Linux