Consolidating HP Serviceguard for Linux and Oracle RAC 10g Clusters, June 2005

Some extra care needs to be taken when administering the clusters. For example, if an
administrator halts only HP Serviceguard for Linux on a node and does not halt CRS (RAC)
on the same node, then a subsequent network partition may result in Serviceguard for Linux
and the applications it is protecting becoming entirely unavailable. This would happen in
case where the only remaining Serviceguard for Linux node is chosen to be rebooted by
Oracle as a result of the network partition. The converse is true as well. The solution is
simple, whenever an administrator takes an action that can affect the membership of one
cluster (for example, “cmhaltnode” for HP Serviceguard for Linux) then the similar command
for the other cluster should be performed on that same node (“srvctl stop” for RAC). Users
may want to write simple scripts to handle these cases.
Clusters with more than two nodes
While all of the examples given have been for a two node cluster, this expands to larger
clusters if necessary. If there is a great enough workload to justify a larger number of nodes,
then consideration should be given to having a RAC cluster with only the database and a
Serviceguard cluster with only the applications. If more than two nodes are needed for a
combined HP Serviceguard for Linux and RAC cluster, then the failure scenarios are the
same as is the configuration.
If a single node fails, then both clusters will detect it and will have matching
memberships.
If only a single heartbeat network were used then a partition that resulted in a 50-50
split, with an equal number of nodes on each side, could cause the same problem as
the loss of heartbeats in a 2 node cluster. The recommended configuration is the
same, both with the number of networks carrying heartbeats, and the optional use of
the Quorum Service.
In the case of a partition with an unequal number of nodes, then the partition with
more than 50% of nodes will survive. If there is a multiple partition with no part
having greater than 50% of the nodes, then all nodes will go down. This is the same
as with Serviceguard today.
Conclusion
HP Serviceguard for Linux adds value in a RAC cluster by providing high availability
encapsulation for third party applications, while reducing the hardware requirements for the
environment. HP Serviceguard for Linux can co-exist with Oracle 10g R1 RAC as a stable
consolidated cluster, with a proper choice of redundant hardware and software components
as mentioned below.
9 Using multiple heartbeat subnets for HP Serviceguard for Linux and redundant
heartbeat networks for HP Serviceguard for Linux and RAC (via channel-bonding)
to prevent re-configuration on both clusters simultaneously.
9 Optional use of Quorum service instead of Lock LUN for HP Serviceguard for
Linux (using default parameters for both clusters) in addition to the multiple
heartbeat paths to further minimize the possibility of a network partition resulting
in both clusters becoming unavailable.
6