Using Serviceguard Extension for RAC, 10th Edition, April 2013

When Cluster Interconnect Subnet Monitoring is used

The Cluster Interconnect Subnet Monitoring feature is used to monitor the CSS-HB network.

If the Serviceguard cluster configuration parameter MEMBER_TIMEOUT is changed, then it is

necessary to also change the Oracle Clusterware parameter CSS miscount.

To change the CSS miscount, use the following guidelines:

The CSS miscount parameter should be greater than the following:

• For SLVM: (number of nodes – 1) times (F + SLVM timeout) + 15 seconds

• For CVM/CFS: (two * number of nodes – 1) times F + 15 seconds

• When both SLVM and CVM/CFS are used, then take the max of the above two calculations.

When Cluster Interconnect Subnet Monitoring is not Used

The Cluster Interconnect Monitoring is not used to monitor the CSS HB subnet.

If the Serviceguard cluster configuration parameter MEMBER_TIMEOUT is changed, then it is

necessary to also change the Oracle Clusterware parameter CSS miscount.

To change the CSS miscount, use the following guidelines:

The CSS miscount parameter should be greater than the following:

• For SLVM: F + SLVM timeout + 15 seconds

• For CVM/CFS: 3 times F + 15 seconds

• When both SLVM and CVM/CFS are used, then take the max of the above two calculations.

NOTE:

1. The “F” represents the Serviceguard failover time as given by the

max_reformation_duration field of cmviewcl –v –f line output.

2. SLVM timeout is documented in the whitepaper, LVM link and Node Failure Recovery Time.

Limitations of Cluster Communication Network Monitor

The Cluster Interconnect Monitoring feature does not coordinate with any feature handling subnet

failures (including self). The failure handling of multiple subnet failures may result in a loss of

services, for example:

• A double switch failure resulting in the simultaneous failure of CSS-HB subnet and SG-HB

subnet on all nodes of a two-node cluster. (Assuming the CSS-HB subnet is different from SG-HB

subnet). Serviceguard may choose to retain one node while the failure handling of interconnect

subnets might choose to retain the other node to handle CSS-HB network failure. As a result,

both nodes will go down.

NOTE: To reduce the risk of failure of multiple subnets simultaneously, each subnet must

have its own networking infrastructure (including networking switches).

• A double switch failure resulting in the simultaneous failure of CSS-HB subnet and RAC-IC

network on all nodes may result in loss of services (Assuming the CSS-HB subnet is different

from RAC-IC network). The failure handling of interconnect subnets might choose to retain one

node for CSS-HB subnet failures and to retain RAC instance on some other node for RAC-IC

subnet failures. Eventually, the database instance will not run on any node as the database

instance is dependent on clusterware to run on that node.

Cluster Communication Network Monitoring 41