OCFS2 Integration with HP Serviceguard for Linux Administrator's Guide, First Edition, November 2008
according to figure 1, the time period between t0 and t3 is 28 seconds. So, using the formula, the
O2CB_NET_IDLE_TIMEOUT parameter is set to 48000 milliseconds.
O2CB_IDLE_TIMEOUT_MS = (28 seconds + 20 seconds) x 1000
O2CB_IDLE_TIMEOUT_MS = 48000 milliseconds
Table 2-1 lists the various HP Serviceguard heartbeat intervals, the corresponding cluster
reconfiguration time and the recommended values for the O2CB_NET_IDLE_TIMEOUT values.
Use this table to configure the values of the O2CB_NET_IDLE_TIMEOUT parameter.
Table 2-1 HP Serviceguard and OCFS2 Values
OCFS2 Network Idle Timeout (in
seconds/milliseconds)
Cluster Reconfiguration Time (in
seconds/milliseconds) for a
configuration with 8 nodes or less than
8 nodesHeartbeat Intervals (in seconds)
48/4800028/280001
76/7600056/560002
160/160000140/1400005
216/216000196/1960007
300/300000280/28000010
OCFS2 Network Idle Timeout (in
seconds/milliseconds)
Cluster Reconfiguration Time (in
seconds/milliseconds) for a
configuration with more than 8
nodes and less than 16 nodesHeartbeat Intervals (in seconds)
60/6000040/400001
100/10000080/800002
220/220000200/2000005
300/300000280/2800007
420/420000400/40000010
Configuring the O2CB_HEARTBEAT_THRESHOLD Parameter
The O2CB_HEARTBEAT_THRESHOLD parameter defines the disk heartbeat timeout. It is defined
as the number of 2–second iterations before a node is considered dead.
The default value of this parameter is an integer, which can be converted into a value in seconds.
Following is the formula to convert the timeout in seconds to the number of iterations:
O2CB_HEARTBEAT_THRESHOLD = (((timeout in seconds) / 2) + 1)
In the event of a failure to access devices which have the OCFS2 file system configured from a
node, Serviceguard must be configured to realize this failure first and take the failed node out
of the cluster. OCFS2 must be configured to realize the failure and start recovery actions only
after Serviceguard has taken the failed node out of the cluster, and a stable cluster is formed with
the remaining nodes. This is done to prevent OCFS2 from fencing nodes before Serviceguard
does so. Figure 2-3 describes the sequence of events that occur when a connection to a storage
device is lost from a node.
Pre-configuration Steps 17