Managing Serviceguard Eighteenth Edition, September 2010

ManualsBrandsHP ManualsSoftwareHP Serviceguard Quorum Software

111

112

113

114

115

116

117

118

119

120

component, and will result in the appropriate switching behavior. Power protection is

provided by HP-supported uninterruptible power supplies (UPS).

Responses to Package and Service Failures

In the default case, the failure of a failover package, or of a service within the package,

causes the package to shut down by running the control script with the ‘stop’ parameter,

and then restarting the package on an alternate node. A package will also fail if it is

configured to have a dependency on another package, and that package fails. If the

package manager receives a report of an EMS (Event Monitoring Service) event showing

that a configured resource dependency is not met, the package fails and tries to restart

on the alternate node.

You can modify this default behavior by specifying that the node should halt (system

reset) before the transfer takes place. You do this by setting failfast parameters in the

package configuration file.

In cases where package shutdown might hang, leaving the node in an unknown state,

failfast options can provide a quick failover, after which the node will be cleaned up

on reboot. Remember, however, that a system reset causes all packages on the node to

halt abruptly without a clean shutdown.

The settings of the failfast parameters in the package configuration file determine the

behavior of the package and the node in the event of a package or resource failure:

• If service_fail_fast_enabledis set to yes in the package configuration file, Serviceguard

will halt the node with a system reset if there is a failure of that specific service.

• If node_fail_fast_enabled is set to yes in the package configuration file, and the

package fails, Serviceguard will halt (system reset) the node on which the package

is running.

NOTE: In a very few cases, Serviceguard will attempt to reboot the system before a

system reset when this behavior is specified. If there is enough time to flush the buffers

in the buffer cache, the reboot succeeds, and a system reset does not take place. Either

way, the system will be guaranteed to come down within a predetermined number of

seconds.

“Choosing Switching and Failover Behavior” (page 176) provides advice on choosing

appropriate failover behavior.

Service Restarts

You can allow a service to restart locally following a failure. To do this, you indicate a

number of restarts for each service in the package control script. When a service starts,

the variable RESTART_COUNT is set in the service’s environment. The service, as it

executes, can examine this variable to see whether it has been restarted after a failure,

and if so, it can take appropriate action such as cleanup.

Responses to Failures 119