OCFS2 Integration with HP Serviceguard for Linux Administrator's Guide, First Edition, November 2008
3. O2CB_KEEPALIVE_DELAY_MS
4. O2CB_RECONNECT_DELAY_MS
These parameters and the timeout values are stored in the file /etc/sysconfig/o2cb on all
the cluster nodes.
Configuring the O2NET_IDLE_TIMEOUT Parameter
The O2NET_IDLE_TIMEOUT parameter refers to the OCFS2 network idle timeout. This parameter
of OCFS2 specifies the time in milliseconds before a network connection is considered dead.
In the event of a network failure, it is important that the Serviceguard cluster manager realizes
the network failure first and starts the necessary procedures to stabilize the cluster. So, OCFS2
must be configured to realize that there is a network failure only after HP Serviceguard for Linux
has formed a stable cluster. Figure 2-2 describes the time at which HP Serviceguard for Linux
realizes that there is a network failure, and the time when OCFS2 realizes the same.
Figure 2-2 OCFS2 Network Idle Timeout Definitions
t0 t1 t2 t3 t4
ODFS2 realizes loss of
network connectivity.
Stable cluster forms
with remaining nodes.
Serviceguard TOC’s
node.
Serviceguard realizes network
failure with loss of heartbeat.
Network links are lost.
In the figure, t0, t1, t2, t3, and t4 represent the time at which HP Serviceguard for Linux and
OCFS2 realize that there is a network failure. Following is the sequence of events:
1. At t0, network links that carry the Serviceguard heartbeat, OCFS2 data, and the TCP
connection is lost.
2. At t1, HP Serviceguard for Linux recognizes that there is a network failure, due to the loss
of the heartbeat.
3. At t2, the node times out, and HP Serviceguard transfers control of the node, forcing it to
restart and rejoin the cluster.
4. At t3, the Serviceguard Cluster Manager forms a stable cluster with the remaining nodes.
5. At t4, OCFS2 realizes that there is a network failure. But at this time, Serviceguard has
already formed a stable cluster.
The time interval between t0 and t3 is known as the Serviceguard Cluster Reconfiguration Time
(CRT). It is recommended that the value of the OCFS2 Network Idle Timeout parameter be set
to 20 seconds more than the time taken by Serviceguard to form a stable cluster. Use the following
formula to determine the value to be set for the O2CB_NET_IDLE_TIMEOUT parameter:
O2CB_NET_IDLE_TIMEOUT = [Serviceguard Cluster Reconfiguration Time (in
seconds) + 20 seconds] x 1000
For example:
If an HP Serviceguard cluster with 4 nodes is configured with a default heartbeat of 1 second
and a node timeout value of 2 seconds, then a stable cluster is formed in about 28 seconds. So
16 Integrating OCFS2 with HP Serviceguard for Linux