Managing HP Serviceguard A.12.00.00 for Linux, June 2014
10.8.5.1 Package Control Script Hangs or Failures
When a RUN_SCRIPT_TIMEOUT or HALT_SCRIPT_TIMEOUT value is set, and the control script
hangs, causing the timeout to be exceeded, Serviceguard kills the script and marks the package
“Halted.” Similarly, when a package control script fails, Serviceguard kills the script and marks
the package “Halted.” In both cases, the following also take place:
• Control of the package will not be transferred.
• The run or halt instructions may not run to completion.
• Global switching will be disabled.
• The current node will be disabled from running the package.
Following such a failure, since the control script is terminated, some of the package’s resources
may be left activated. Specifically:
• Volume groups may be left active.
• File systems may still be mounted.
• IP addresses may still be installed.
• Services may still be running.
In this kind of situation, Serviceguard will not restart the package without manual intervention. You
must clean up manually before restarting the package. Use the following steps as guidelines:
1. Perform application specific cleanup. Any application specific actions the control script might
have taken should be undone to ensure successfully starting the package on an alternate node.
This might include such things as shutting down application processes, removing lock files,
and removing temporary files.
2. Ensure that package IP addresses are removed from the system. This step is accomplished via
the cmmodnet(1m) command. First determine which package IP addresses are installed by
inspecting the output resulting from running the ifconfig command. If any of the IP addresses
specified in the package control script appear in the ifconfig output under the inet addr:
in the ethX:Y block, use cmmodnet to remove them:
cmmodnet -r -i <ip-address> <subnet>
where <ip-address> is the address indicated above and <subnet> is the result of masking
the <ip-address> with the mask found in the same line as the inet address in the
ifconfig output.
3. Ensure that package volume groups are deactivated. First unmount any package logical
volumes which are being used for file systems. This is determined by inspecting the output
resulting from running the command df -l. If any package logical volumes, as specified by
the LV[] array variables in the package control script, appear under the “Filesystem” column,
use umount to unmount them:
fuser -ku <logical-volume>
umount <logical-volume>
Next, deactivate the package volume groups. These are specified by the VG[] array entries
in the package control script.
vgchange -a n <volume-group>
4. Finally, re-enable the package for switching.
cmmodpkg -e <package-name>
If after cleaning up the node on which the timeout occurred it is desirable to have that node
as an alternate for running the package, remember to re-enable the package to run on the
node:
cmmodpkg -e -n <node-name> <package-name>
10.8 Solving Problems 279