Managing Serviceguard Eighteenth Edition, September 2010
Suppose a script run by pkg1 does a cmmodpkg -d of pkg2, and a script run by pkg2
does a cmmodpkg -d of pkg1. If both pkg1 and pkg2 start at the same time, the pkg1
script now tries to cmmodpkg pkg2. But that cmmodpkg command has to wait for pkg2
startup to complete. The pkg2 script tries to cmmodpkg pkg1, but pkg2 has to wait
for pkg1 startup to complete, thereby causing a command loop.
To avoid this situation, it is a good idea to specify a run_script_timeout and
halt_script_timeout (page 291) for all packages, especially packages that use Serviceguard
commands in their external scripts. If a timeout is not specified and your configuration
has a command loop as described above, inconsistent results can occur, including a
hung cluster.
Determining Why a Package Has Shut Down
You can use an external script (or CUSTOMER DEFINED FUNCTIONS area of a legacy
package control script) to find out why a package has shut down.
Serviceguard sets the environment variable SG_HALT_REASON in the package control
script to one of the following values when the package halts:
• failure - set if the package halts because of the failure of a subnet, resource, or
service it depends on
• user_halt - set if the package is halted by a cmhaltpkg or cmhaltnode
command, or by corresponding actions in Serviceguard Manager
• automatic_halt - set if the package is failed over automatically because of the
failure of a package it depends on, or is failed back to its primary node
automatically (failback_policy = automatic)
You can add custom code to the package to interrogate this variable, determine why
the package halted, and take appropriate action. For legacy packages, put the code in
the customer_defined_halt_cmds() function in the CUSTOMER DEFINED
FUNCTIONS area of the package control script; see “Adding Customer Defined Functions
to the Package Control Script ” (page 380). For modular packages, put the code in the
package’s external script; see “About External Scripts” (page 197).
For example, if a database package is being halted by an administrator
(SG_HALT_REASON set to user_halt) you would probably want the custom code
to perform an orderly shutdown of the database; on the other hand, a forced shutdown
might be needed if SG_HALT_REASON is set to failure, indicating thatthe
package is halting abnormally (for example because of the failure of a service it depends
on).
last_halt_failed
cmviewcl -v -f line displays a last_halt_failed flag.
200 Planning and Documenting an HA Cluster