Install guide

within application servers must be configured correctly, so failover delay is minimized.

The backside network is a private, dedicated network that should be configured as a four-port VLAN, if a

non-private switch is used.

Most customers buy dual-ported NICs, which are not as reliable as two single-ported NICs. However,

bonding ports across different drivers is also not recommended (bonding a TG3 port and an e1000

port, for instance). If possible use twp outboard single-ported NICs. Servers with the same out-board

ports as the built-in ports (all e1000 ports, for instance), can safely cross-bond.

Connecting the ports to two different switches may also not work in some cases, so creating a fully

redundant bonding NIC pathway is harder than it should be. Since the goal of the back-side network is

for heartbeat, if the NIC fails but the server is up the server is still fenced. Statistically, the cluster might

fence a little more often, but that’s about it.

2.4. RAC/GFS Considerations

Oracle Clusterware implements Virtual IP routing so that target IP addresses of the failed node can

be quickly taken over by the surviving node. T his means new connections see little or no delay.

In the GFS/RAC cluster, Oracle uses the back-side network to implement Oracle Global Cache

Fusion (GCS) and database blocks can be moved between nodes over this link. This can place extra

load on this link, and for certain workloads, a second dedicated backside network might be required.

Bonded links using LACP (Link Aggregation Control Protocol) for higher capacity, GCS links, using

multiple GbE links are supported, but not extensively tested. Customers may also run the simple,

two-NIC bond in load-balance, but the recommendation is to use this for failover, especially in the

two-node case.

Oracle GCS can also be implemented over Infiniband using the Reliable Data Sockets (RDS)

protocol. This provides an extremely low latency, memory-to-memory connection. T his strategy is

more often required in high node-count clusters, which implement data warehouses. In these larger

clusters, the inter-node traffic (and GCS coherency protocol) easily exhausts the capacity of

conventional GbE/udp links.

Oracle RAC has other strategies to preserve existing sessions and transactions from the failed node

(Oracle Transparent Session and Application Migration/Failover). Most customers do not implement

these features. However, they are available, and near non-stop failover is possible with RAC. These

features are not available in the Cold Failover configuration, so the client tier must be configured

accordingly.

Oracle RAC is quite expensive, but can provide that last 5% of uptime that might make the extra cost

worth every nickel. A simple two-node Red Hat Cluster Suite Oracle Failover cluster only requires one

Enterprise Edition license. T he two-node RAC/GFS cluster requires two Enterprise Edition licenses

and a separately priced license for RAC (and partitioning).

2.5. Fencing Configuration

Fencing is a technique used to remove a cluster member from an active cluster, as determined by loss of

communication with the cluster. T here are two fail-safe mechanisms in a typical Oracle HA configuration:

the quorum voting disk service, qdisk, and the cm an heartbeat mechanism that operates over the

private, bonded network. If either node fails to “check-in” within a prescribed time, actions are taken to

remove, or fence the node from the rest of the active cluster. Fencing is the most important job that a

cluster product must do. Inconsistent, or unreliable fencing can result in corruption of the Oracle

database -- it must be bulletproof.

Red Hat Cluster Suite provides more fencing technologies than either Veritas Foundation Suite, or

Red Hat Enterprise Linux 5 Configuration Example - Oracle HA on Cluster Suite