Sample Configuration with HP Serviceguard Extension for RAG and Oracle Real Application Clusters 11g release 2 using Cluster File System
6
Service (RAC-DB-IC) failure is discovered, the speed of recovery actions by impacted components—
for example, SG, Group Membership Service (GMS), Cluster Synchronization Service (CSS),
and/or RAC)—and database recovery time.
– For a complete SG cluster interconnect failure, SG sees the failure within the
MEMBER_TIMEOUT
timeframe.
– With SG/CFS, Group Membership Service (Atomic Broadcast) (GAB)/Low Latency Transport (LLT)
and SG share the same networks and SG sees the interconnect failure within the
MEMBER_TIMEOUT
timeframe.
– HP recommends configuring the CSS heartbeat (CSS-HB) on the same network as the
Serviceguard heartbeat (SG-HB) In configurations where the CSS-HB and the SG-HB share the
same interconnect network, SG will react to failures within the
MEMBER_TIMEOUT timeframe (sooner
than the CSS timeout) and fail the node with a transfer of control (TOC).
If the CSS traffic is on a SG monitored network (but not on a SG SG-HB network), SG packages
can be configured with cluster interconnect subnet monitoring to detect failure of the CSS network
and TOC the node sooner than the CSS timeout. If CSS traffic is not sharing the SG-HB network,
and SG is not configured to monitor the CSS-HB network, CSS will detect the interconnect failure
within the CSS timeout and TOC the node. HP does not recommend this architecture because it
can cause inconsistencies in cluster membership.
– With RAC-DB-IC, on configurations where the CSS-HB, the SG-HB, and the RAC-DB-IC share the
same interconnect network, SG sees the failure within the
MEMBER_TIMEOUT timeframe (sooner
than instance membership recovery [IMR] timeout). SG packages can be configured with cluster
interconnect subnet monitoring to monitor the RAC-DB-IC and detect a failure before the IMR
timeout is completed. If the RAC-DB-IC is not sharing the SG-HB network and SG is not configured
to monitor the RAC-DB-IC network, RAC discovers the interconnect failure within the IMR timeout.
The failover time requirement determines important timeouts, such as SG
MEMBER_TIMEOUT, network
polling intervals, and cluster interconnect monitoring.
Note: Cluster interconnect subnet monitoring provides better availability by detecting and resolving
RAC-DB-IC subnet failures quickly, and providing services (on one node) when the Oracle CSS-
HB/RAC-IC subnet fails on all nodes.
Planning for high availability
A properly configured high availability (HA) configuration should survive a single point of failure and
continue to operate.
Public network HA
There are two ways that client public network high availability is sustained: redundant components
and client failover.
• Redundant network interfaces and switches with local LAN failover provided by SG (or bonding by
Auto-Port Aggregation (APA) protect against single point network failures.
• Client failover protects against failure of existing or new client sessions. These failures include node
failures (such as those caused by a power failure) and network failures (for example, failure of all
redundant network interface/links). Protection is available at three levels: Oracle FAN, remote VIP
failover, and client connection timeout. Clients that are FAN integrated—or are using the FAN
API—may interrupt existing sessions and failover. Remote VIP failover is useful for non-FAN clients
attempting to connect to the local node to avoid a TCP connection timeout. The client connection
timeout is useful when client connection takes a long time for any reason.