Specifications
Server Failure Recovery Scenario
determines whether the failing resource or a resource that depends upon the failing resource
has any shared equivalencies with a resource on any other systems,and selects the one to the
highest priority alive server. Only one equivalent resource can be active at a time.
If no equivalency exists, the recover process halts.
If a shared equivalency is found and selected, LifeKeeper initiates inter-server recovery. The
recover process sends a message through the LCM to the LCD process on the selected
backup system containing the shared equivalent resource. This means that LifeKeeper would
attempt inter-server recovery.
9. lcdrecover process coordinates transfer. The LCD process on the backup server forks the
process lcdrecover to coordinate the transfer of the equivalent resource.
10. Activation on backup server. The lcdrecover process finds the equivalent resource and
determines whether it depends upon any resources that are not in-service. lcdrecover runs
the restore script (part of the resource recovery action scripts) for each required resource,
placing the resources in-service.
The act of restoring a resource on a backup server may result in the need for more shared
resources to be transferred from the primary system. Messages pass to and from the primary
system, indicating resources that need to be removed from service on the primary server and
then brought into service on the selected backup server to provide full functionality of the
critical applications. This activity continues, until no new shared resources are needed and all
necessary resource instances on the backup are restored.
Server Failure Recovery Scenario
The LifeKeeper Communications Manager (LCM) has two functions:
l Messaging. The LCM serves as a conduit through which LifeKeeper sends messages during
recovery, configuration, or when running an audit.
l Failure detection. The LCM also plays a role in detecting whether or not a server has failed.
LifeKeeper has a built-in heartbeat signal that periodically notifies each server in the configuration that
its paired server is operating. If a server fails to receive the heartbeat message through one of the
communications paths, LifeKeeper marks that path DEAD.
The following figure illustrates the recovery tasks when the LCM heartbeat mechanism detects a
server failure.
SteelEye Protection Suite for Linux59