Best Practices for SGeRAC and Oracle RAC on HP-UX 11i, March 2009

limitation where if all interconnect fails (e.g. primary and standby switch failed at the same time), all

nodes are halted. If there is a concern with simultaneous failure of both switches, Serviceguard

supports multiple standby and additional standby switches may be added.

Without subnet monitoring, double networks failures on the CSS-HB network will be discovered by

CSS-HB timeout and CSS determines which node to evict.

Starting with A.11.18, SGeRAC introduced Cluster Interconnect Subnet (CIS) monitoring to address

the limitation with subnet monitoring, CLUSTER_INTERCONNECT_SUBNET can be used in

conjunction with the node fail fast enabled option for monitoring the CSS-HB network. When CSS is

configured with MNP packages and CSS-HB subnet is monitored using

CLUSTER_INTERCONNECT_SUBNET, if the monitored subnet fails, the MNP package will halt the

nodes where the monitored subnet has failed, and at least one MNP package and node will remain.

Note:

For 11gR1, when using CIS for CSS-HB in configurations of three or more

nodes on three or more nodes configurations and nodes are halted

because of a fail fast from CIS, new connections on the RAC instance be

may delayed up to ten minutes.

When CIS monitoring is used on the CSS-HB network, the installed default values for various timeouts

should be sufficient for most installations. If you need to change the Serviceguard member timeout,

Serviceguard heartbeat interval or CSS-HB timeout, the CSS-HB timeout should be tuned to provide an

opportunity for Serviceguard to complete reconfiguration and update CSS through group membership

service (GMS) prior to CSS timeout.

Optionally if RAC-DB-IC timeout is changed, RAC-DB-IC timeout

should be 15 seconds above CSS-HB timeout.

Guidelines for changing cluster parameters

These are general guidelines for changing cluster parameters for timeouts depending on whether

Cluster Interconnect Subnet monitoring is used.

When Cluster Interconnect Subnet monitoring is used to monitor the CSS-HB network and if any of the

following cluster parameters needs to be changed the default values of:

• Oracle Clusterware parameter CSS MISSCOUNT

• Serviceguard cluster configuration parameter MEMBER_TIMEOUT

Then the CSS MISSCOUNT parameter should be greater than:

• For SLVM: (number of nodes – 1) times (F + SLVM timeout) + 15 seconds

• For CVM/CFS: (two times number of nodes – 1) times F + 15 seconds

• When both SLVM and CVM/CFS are used, then take the max of the above two calculations

Note 1:

F is the Serviceguard failover time as given by the

max_reformation_duration field in the ouptput of cmviewcl –v –f

line output.

The relation when using CIS is different than without using CIS is due to the need for CSS-HB timeout to take into consideration scenarios where

multiple nodes may fail fast sequentially to arrive to at least one MNP instance remains.