OCFS2 Integration with HP Serviceguard for Linux Administrator's Guide, First Edition, November 2008

according to figure 1, the time period between t0 and t3 is 28 seconds. So, using the formula, the

O2CB_NET_IDLE_TIMEOUT parameter is set to 48000 milliseconds.

O2CB_IDLE_TIMEOUT_MS = (28 seconds + 20 seconds) x 1000

O2CB_IDLE_TIMEOUT_MS = 48000 milliseconds

Table 2-1 lists the various HP Serviceguard heartbeat intervals, the corresponding cluster

reconfiguration time and the recommended values for the O2CB_NET_IDLE_TIMEOUT values.

Use this table to configure the values of the O2CB_NET_IDLE_TIMEOUT parameter.

Table 2-1 HP Serviceguard and OCFS2 Values

OCFS2 Network Idle Timeout (in

seconds/milliseconds)

Cluster Reconfiguration Time (in

seconds/milliseconds) for a

configuration with 8 nodes or less than

8 nodesHeartbeat Intervals (in seconds)

48/4800028/280001

76/7600056/560002

160/160000140/1400005

216/216000196/1960007

300/300000280/28000010

OCFS2 Network Idle Timeout (in

seconds/milliseconds)

Cluster Reconfiguration Time (in

seconds/milliseconds) for a

configuration with more than 8

nodes and less than 16 nodesHeartbeat Intervals (in seconds)

60/6000040/400001

100/10000080/800002

220/220000200/2000005

300/300000280/2800007

420/420000400/40000010

Configuring the O2CB_HEARTBEAT_THRESHOLD Parameter

The O2CB_HEARTBEAT_THRESHOLD parameter defines the disk heartbeat timeout. It is defined

as the number of 2–second iterations before a node is considered dead.

The default value of this parameter is an integer, which can be converted into a value in seconds.

Following is the formula to convert the timeout in seconds to the number of iterations:

O2CB_HEARTBEAT_THRESHOLD = (((timeout in seconds) / 2) + 1)

In the event of a failure to access devices which have the OCFS2 file system configured from a

node, Serviceguard must be configured to realize this failure first and take the failed node out

of the cluster. OCFS2 must be configured to realize the failure and start recovery actions only

after Serviceguard has taken the failed node out of the cluster, and a stable cluster is formed with

the remaining nodes. This is done to prevent OCFS2 from fencing nodes before Serviceguard

does so. Figure 2-3 describes the sequence of events that occur when a connection to a storage

device is lost from a node.

Pre-configuration Steps 17