Install guide
members. T his ensures the cluster is viable and that Oracle can continue to operate on the primary
node. Some failure cases can cause the heartbeat to become erratic or unreliable, so modern clustering
products provide a second check-in mechanism, which insures that quorum is maintained. Quorum
voting causes each cluster member to identify itself by voting, in the form of a simple write to a shared
vote, or quorum disk. The combination of heartbeat and quorum disk minimizes the risk of split-brain
cluster states. Split-brain clusters occur when the two nodes think they are both in the correct cluster
state, so both access the shared storage. Split-brain states create the highest risk of database
corruption, so this is the functional core of the cluster.
The two most common examples of Oracle Cold Failover are HP ServiceGuard and Veritas Cluster
Server (VCS). Red Hat Cluster Suite’s implementation for Oracle closely models these products, so
customers familiar with them will be immediately familiar with Red Hat Cluster Suite Oracle HA. Of course,
the devil is most definitely in the details.
5.1. Red Hat Cluster Suite HA
Red Hat Cluster Suite contains all the requisite components to implement Oracle HA: heartbeat, quorum
disk voting, fencing and a resource harness to relocate the Oracle instance, when necessary. The major
differences between how Red Hat Cluster Suite is set up for RAC and how it is set up for single instance
involves appropriate timeout settings and the configuration of the resource harness, aptly named
rgmanager.
5.2. Red Hat Cluster Suite Timers
When Oracle RAC is installed, Red Hat Cluster Suite must interact with Oracle Clusterware, but also is in
control of the timeouts and eventually the fencing. When Oracle HA is configured, Red Hat Cluster Suite
is also in charge, so the timeouts are very similar.
Tip
It is critical that the Red Hat Cluster Suite heartbeat service operates over the private, bonded
network, not the public network. If the private network fails for a node, then this node must be
removed from the cluster.
All installations will have subtly different timeout requirements, but start with these recommended
settings:
<cluster config_version="11" nam e="dl585">
<fence_daemon clean_start="1" post_fail_delay="0"post_join_delay="3" />
<quorumd device="/dev/mapper/qdisk" interval="2" log_level="5" tko="8"
votes="1" />
<cm an expected_votes="3" two_node="0" />
<totem token="33000" />
In this example, the quorum disk is the level fencing mechanism with a timeout of 16 seconds; that is, two
intervals of 8 seconds. T he tko parameter stands for T echnical Knock Out — a boxing metaphor. The
CMAN heartbeat timeouts must be more than two time the tko timeouts; we choose 33 seconds (value
in ms). This delay gives the quorum daemon adequate time to establish which node is the master during
a failure, or if there is a load spike that might delay voting. T he expected_votes parameter is set to
the number of nodes + 1.
Chapter 5. Cold Failover Cluster Configuration
37