Installation guide

Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware 41
To set up a redundant Ethernet heartbeat channel, use a network crossover cable to connect a network
interface on one cluster system to a network interface on the other cluster system.
To set up a serial heartbeat channel, use a null modem cable to connect a serial port on one cluster
system to a serial port on the other cluster system. Be sure to connect corresponding serial ports
on the cluster systems; do not connect to the serial port that will be used for a remote power switch
connection. In the future, should support be added for more than two cluster members, then usage of
serial based heartbeat channels may be deprecated.
2.4.2 Configuring Power Switches
Power switches enable a cluster system to power-cycle the other cluster system before restarting its
services as part of the failover process. The ability to remotely disable a system ensures data in-
tegrity is maintained under any failure condition. It is recommended that production environments
use power switches or watchdog timers in the cluster configuration. Only development (test) envi-
ronments should use a configuration without power switches (type "None"). Refer to Section 2.1.3,
Choosing the Type of Power Controller for a description of the various types of power switches. Note
that within this section, the general term "power switch" also includes watchdog timers.
In a cluster configuration that uses physical power switches, each cluster system’s power cable is
connected to a power switch through either a serial or network connection (depending on switch type).
When failover occurs, a cluster system can use this connection to power-cycle the other cluster system
before restarting its services.
Power switches protect against data corruption if an unresponsive (or hanging) system becomes re-
sponsive after its services have failed over, and issues I/O to a disk that is also receiving I/O from
the other cluster system. In addition, if a quorum daemon fails on a cluster system, the system is no
longer able to monitor the quorum partitions. If power switches or watchdog timers are not used in
the cluster, then this error condition may result in services being run on more than one cluster system,
which can cause data corruption and possibly system crashes.
It is strongly recommended to use power switches in a cluster. However, administrators who are aware
of the risks may choose to set up a cluster without power switches.
A cluster system may hang for a few seconds if it is swapping or has a high system workload. For this
reason, adequate time is allowed prior to concluding another system has failed (typically 12 seconds).
A cluster system may "hang" indefinitely because of a hardware failure or kernel error. In this case, the
other cluster will notice that the hung system is not updating its timestamp on the quorum partitions,
and is not responding to pings over the heartbeat channels.
If a cluster system determines that a hung system is down, and power switches are used in the cluster,
the cluster system will power-cycle the hung system before restarting its services. Clusters configured
to use watchdog timers will self-reboot under most system hangs. This will cause the hung system to
reboot in a clean state, and prevent it from issuing I/O and corrupting service data.