Managing Serviceguard A.11.20, March 2013

ManualsBrandsHP ManualsSoftwareHP Serviceguard Software

100

Disk protection is provided by separate products, such as Mirrordisk/UX in LVM or Veritas mirroring

in VxVM and related products. In addition, separately available EMS disk monitors allow you to

notify operations personnel when a specific failure, such as a lock disk failure, takes place. Refer

to the manual Using High Availability Monitors, which you can find at the address given in the

preface to this manual.

Serviceguard does not respond directly to power failures, although a loss of power to an individual

cluster component may appear to Serviceguard like the failure of that component, and will result

in the appropriate switching behavior. Power protection is provided by HP-supported uninterruptible

power supplies (UPS).

Responses to Package and Service Failures

In the default case, the failure of a failover package, a generic resource, or of a service within the

package, causes the package to shut down by running the control script with the ‘stop’ parameter,

and then restarting the package on an alternate node. A package will also fail if it is configured

to have a dependency on another package, and that package fails. If the package manager

receives a report of an EMS (Event Monitoring Service) event showing that a configured resource

dependency is not met, the package fails and tries to restart on the alternate node.

You can modify this default behavior by specifying that the node should halt (system reset) before

the transfer takes place. You do this by setting failfast parameters in the package configuration

file.

In cases where package shutdown might hang, leaving the node in an unknown state, failfast

options can provide a quick failover, after which the node will be cleaned up on reboot. Remember,

however, that a system reset causes all packages on the node to halt abruptly without a clean

shutdown.

The settings of the failfast parameters in the package configuration file determine the behavior of

the package and the node in the event of a package or resource failure:

• If service_fail_fast_enabled is set to yes in the package configuration file,

Serviceguard will halt the node with a system reset if there is a failure of that specific service.

• If node_fail_fast_enabled is set to yes in the package configuration file, and the

package fails, Serviceguard will halt (system reset) the node on which the package is running.

NOTE: In a very few cases, Serviceguard will attempt to reboot the system before a system reset

when this behavior is specified. If there is enough time to flush the buffers in the buffer cache, the

reboot succeeds, and a system reset does not take place. Either way, the system will be guaranteed

to come down within a predetermined number of seconds.

“Choosing Switching and Failover Behavior” (page 137) provides advice on choosing appropriate

failover behavior.

Responses to Package and Generic Resources Failures

In a package that is configured with a generic resource and is running, failure of a resource prompts

the Serviceguard Package Manager to take appropriate action based on the style of the package.

For failover packages, the package is halted on the node on which generic resource failure occurred

and started on an available alternative node. For multi-node packages, failure of a generic resources

causes the package to be halted only on the node on which the failure occurred.

• In case of simple resources, failure of a resource must trigger the monitoring script to set the

status of a resource to 'down' using the cmsetresource command.

• In case of extended resources, the value fetched by the monitoring script can be set using the

cmsetresource command.

The Serviceguard Package Manager evaluates this value against the

generic_resource_up_criteria set for a resource in the packages where it is configured.

Responses to Failures 95