Installation guide
Section B.3:Failover and Recovery Scenarios 175
• All the heartbeat network cables are disconnected from a system.
• All the serial connections and network interfaces used for heartbeat communication fail.
If a total network connection failure occurs, both systems detect the problem, but they also detect that
the SCSI disk connections are still active. Therefore, services remain running on the systems and are
not interrupted.
If a total network connection failure occurs, diagnose the problem and then do one of the following:
• If the problem affects only one cluster system, relocate its services to the other system. Then,
correct the problem and relocate the services back to the original system.
• Manually stop the services on one cluster system. In this case, services do not automatically fail
over to the other system. Instead, restart the services manually on the other system. After the
problem is corrected, it is possible to re-balance the services across the systems.
• Shut down one cluster system. In this case, the following occurs:
1. Services are stopped on the cluster system that is shut down.
2. The remaining cluster system detects that the system is being shut down.
3. Any services that were running on the system that was shut down are restarted on the remaining
cluster system.
4. If the system reboots, and can join the cluster (that is, the system can write to both quorum
partitions), services are re-balanced across the member systems, according to each service’s
placement policy.
B.3.5 Remote Power Switch Connection Failure
If a query to a remote power switch connection fails, but both systems continue to have power, there
is no change in cluster behavior unless a cluster system attempts to use the failed remote power switch
connection to power-cycle the other system. The power daemon will continually log high-priority
messages indicating a power switch failure or a loss of connectivity to the power switch (for example,
if a cable has been disconnected).
If a cluster system attempts to use a failed remote power switch, services running on the system that
experienced the failure are stopped. However, to ensure data integrity, they are not failed over to the
other cluster system. Instead, they remain stopped until the hardware failure is corrected.
B.3.6 Quorum Daemon Failure
If a quorum daemon fails on a cluster system, the system is no longer able to monitor the quorum
partitions. If power switches are not used in the cluster, this error condition may result in services
being run on more than one cluster system, which can cause data corruption.