Installation guide
Section B.3:Failover and Recovery Scenarios 173
B.3 Failover and Recovery Scenarios
Understanding cluster behavior when significant events occur can assist in the proper management of
a cluster. Note that cluster behavior depends on whether power switches are employed in the con-
figuration. Power switches enable the cluster to maintain complete data integrity under all failure
conditions.
The following sections describe how the system will respond to various failure and error scenarios.
B.3.1 System Hang
In a cluster configuration that uses power switches, if a system hangs, the cluster behaves as follows:
1. The functional cluster system detects that the hung cluster system is not updating its timestamp on
the quorum partitions and is not communicating over the heartbeat channels.
2. The functional cluster system power-cycles the hung system. Alternatively, if watchdog timers are
in use, a failed system will reboot itself.
3. The functional cluster system restarts any services that were running on the hung system.
4. If the previously hung system reboots, and can join the cluster (that is, the system can write to
both quorum partitions), services are re-balanced across the member systems, according to each
service’s placement policy.
In a cluster configuration that does not use power switches, if a system hangs, the cluster behaves as
follows:
1. The functional cluster system detects that the hung cluster system is not updating its timestamp on
the quorum partitions and is not communicating over the heartbeat channels.
2. Optionally, if watchdog timers are used, the failed system will reboot itself.
3. The functional cluster system sets the status of the hung system to
DOWN on the quorum partitions,
and then restarts the hung system’s services.
4. If the hung system becomes active, it notices that its status is
DOWN, and initiates a system reboot.
If the system remains hung, manually power-cycle the hung system in order for it to resume cluster
operation.
5. If the previously hung system reboots, and can join the cluster, services are re-balanced across the
member systems, according to each service’s placement policy.