Architecture considerations and best practices for architecting an Oracle RAC solution with Serviceguard and SGeRAC

5
CSS manages the Oracle cluster membership and provides its own group membership service to RAC
instances. CSS manages the cluster by controlling which nodes are members of the cluster and by
notifying other members when a node joins or leaves the cluster. If third-party cluster software such as
SGeRAC is used in the environment, CSS utilizes the group membership service provided by the
third-party cluster software.
CRS manages high availability operation within a cluster. Anything CRS manages is known as a
clustered resource. A clustered resource could be a database, an RAC instance, a listener, a virtual IP
(VIP) address, an application process, etc. CRS manages clustered resources using the resource’s
configuration information that is stored in a shared Oracle Cluster Registry (OCR) file. This includes
start, stop, monitor, and failover operations.
EVM publishes events generated by CRS and may run scripts when certain events occur.
Oracle Clusterware requires that the OCR and the voting disk be configured on shared raw devices,
on shared raw volumes, as files in a cluster file system, or on an ASM volume (starting with 11gR2).
Cluster protocol
CSS maintains two heartbeat mechanisms, the disk heartbeat to the voting disk and the network
heartbeat, which is used to confirm valid node membership in the cluster. Each of these heartbeat
mechanisms has a timeout value, expressed in seconds.
The network heartbeat timeout value is known as CSS MISSCOUNT. On UNIX
®
platforms the
default is 30
1
The default value for disk heartbeat timeout varies depending on the version. The default for 11gR2
is 200 seconds.
seconds.
Even though Oracle Clusterware provides its own cluster membership services, Clusterware will use
the node membership services provided by SGeRAC, if SGeRAC is installed. If there is a network
partition (meaning that nodes lose communication with each other), one or more nodes may be
evicted from the cluster automatically to prevent data corruption. Depending on the type of failure, the
eviction can be triggered by SGeRAC or by the Oracle Clusterware CSS daemon. See the section
“Cluster managementwho controls what and when?” later in this document for more information.
Networking
Three networks are required in an Oracle RAC configuration:
A private network for cluster node, heartbeat communication. This is referred to as the CSS
heartbeat.
A private network for the Global Cache Service (GCS) and Global Enqueue Service (GES). This
network is also known as Cache Fusion and DLM network. From here on, the term “RAC
interconnectwill be used for this private network.
A public network for client connections.
Currently, Oracle Clusterware does not support redundant standby networks and does not monitor
NICs. For the public network and private network high availability, Oracle relies either on platform
network bonding software such as the HP-UX Auto-Port Aggregation (APA) product or on networking
features provided by third-party cluster software, such as primary/standby LANs in Serviceguard.
1
Based on Oracle Support (http://support.oracle.com login required) Bulletin ID: 294430.1