Install guide
within application servers must be configured correctly, so failover delay is minimized.
The backside network is a private, dedicated network that should be configured as a four-port VLAN, if a
non-private switch is used.
Most customers buy dual-ported NICs, which are not as reliable as two single-ported NICs. However,
bonding ports across different drivers is also not recommended (bonding a TG3 port and an e1000
port, for instance). If possible use twp outboard single-ported NICs. Servers with the same out-board
ports as the built-in ports (all e1000 ports, for instance), can safely cross-bond.
Connecting the ports to two different switches may also not work in some cases, so creating a fully
redundant bonding NIC pathway is harder than it should be. Since the goal of the back-side network is
for heartbeat, if the NIC fails but the server is up the server is still fenced. Statistically, the cluster might
fence a little more often, but that’s about it.
2.4. RAC/GFS Considerations
Oracle Clusterware implements Virtual IP routing so that target IP addresses of the failed node can
be quickly taken over by the surviving node. T his means new connections see little or no delay.
In the GFS/RAC cluster, Oracle uses the back-side network to implement Oracle Global Cache
Fusion (GCS) and database blocks can be moved between nodes over this link. This can place extra
load on this link, and for certain workloads, a second dedicated backside network might be required.
Bonded links using LACP (Link Aggregation Control Protocol) for higher capacity, GCS links, using
multiple GbE links are supported, but not extensively tested. Customers may also run the simple,
two-NIC bond in load-balance, but the recommendation is to use this for failover, especially in the
two-node case.
Oracle GCS can also be implemented over Infiniband using the Reliable Data Sockets (RDS)
protocol. This provides an extremely low latency, memory-to-memory connection. T his strategy is
more often required in high node-count clusters, which implement data warehouses. In these larger
clusters, the inter-node traffic (and GCS coherency protocol) easily exhausts the capacity of
conventional GbE/udp links.
Oracle RAC has other strategies to preserve existing sessions and transactions from the failed node
(Oracle Transparent Session and Application Migration/Failover). Most customers do not implement
these features. However, they are available, and near non-stop failover is possible with RAC. These
features are not available in the Cold Failover configuration, so the client tier must be configured
accordingly.
Oracle RAC is quite expensive, but can provide that last 5% of uptime that might make the extra cost
worth every nickel. A simple two-node Red Hat Cluster Suite Oracle Failover cluster only requires one
Enterprise Edition license. T he two-node RAC/GFS cluster requires two Enterprise Edition licenses
and a separately priced license for RAC (and partitioning).
2.5. Fencing Configuration
Fencing is a technique used to remove a cluster member from an active cluster, as determined by loss of
communication with the cluster. T here are two fail-safe mechanisms in a typical Oracle HA configuration:
the quorum voting disk service, qdisk, and the cm an heartbeat mechanism that operates over the
private, bonded network. If either node fails to “check-in” within a prescribed time, actions are taken to
remove, or fence the node from the rest of the active cluster. Fencing is the most important job that a
cluster product must do. Inconsistent, or unreliable fencing can result in corruption of the Oracle
database -- it must be bulletproof.
Red Hat Cluster Suite provides more fencing technologies than either Veritas Foundation Suite, or
Red Hat Enterprise Linux 5 Configuration Example - Oracle HA on Cluster Suite
18