Best Practices for SGeRAC and Oracle RAC on HP-UX 11i, March 2009
11
limitation where if all interconnect fails (e.g. primary and standby switch failed at the same time), all
nodes are halted. If there is a concern with simultaneous failure of both switches, Serviceguard
supports multiple standby and additional standby switches may be added.
Without subnet monitoring, double networks failures on the CSS-HB network will be discovered by
CSS-HB timeout and CSS determines which node to evict.
Starting with A.11.18, SGeRAC introduced Cluster Interconnect Subnet (CIS) monitoring to address
the limitation with subnet monitoring, CLUSTER_INTERCONNECT_SUBNET can be used in
conjunction with the node fail fast enabled option for monitoring the CSS-HB network. When CSS is
configured with MNP packages and CSS-HB subnet is monitored using
CLUSTER_INTERCONNECT_SUBNET, if the monitored subnet fails, the MNP package will halt the
nodes where the monitored subnet has failed, and at least one MNP package and node will remain.
Note:
For 11gR1, when using CIS for CSS-HB in configurations of three or more
nodes on three or more nodes configurations and nodes are halted
because of a fail fast from CIS, new connections on the RAC instance be
may delayed up to ten minutes.
When CIS monitoring is used on the CSS-HB network, the installed default values for various timeouts
should be sufficient for most installations. If you need to change the Serviceguard member timeout,
Serviceguard heartbeat interval or CSS-HB timeout, the CSS-HB timeout should be tuned to provide an
opportunity for Serviceguard to complete reconfiguration and update CSS through group membership
service (GMS) prior to CSS timeout.
6
Optionally if RAC-DB-IC timeout is changed, RAC-DB-IC timeout
should be 15 seconds above CSS-HB timeout.
Guidelines for changing cluster parameters
These are general guidelines for changing cluster parameters for timeouts depending on whether
Cluster Interconnect Subnet monitoring is used.
When Cluster Interconnect Subnet monitoring is used to monitor the CSS-HB network and if any of the
following cluster parameters needs to be changed the default values of:
• Oracle Clusterware parameter CSS MISSCOUNT
• Serviceguard cluster configuration parameter MEMBER_TIMEOUT
Then the CSS MISSCOUNT parameter should be greater than:
• For SLVM: (number of nodes – 1) times (F + SLVM timeout) + 15 seconds
• For CVM/CFS: (two times number of nodes – 1) times F + 15 seconds
• When both SLVM and CVM/CFS are used, then take the max of the above two calculations
Note 1:
F is the Serviceguard failover time as given by the
max_reformation_duration field in the ouptput of cmviewcl –v –f
line output.
6
The relation when using CIS is different than without using CIS is due to the need for CSS-HB timeout to take into consideration scenarios where
multiple nodes may fail fast sequentially to arrive to at least one MNP instance remains.