Managing Serviceguard Nineteenth Edition, Reprinted June 2011

ManualsBrandsHP ManualsSoftwareHP Serviceguard Software

321

322

323

324

325

326

327

328

329

330

Force Import and Deport After Node Failure

After certain failures, packages configured with VxVM disk groups will fail to start, logging an

error such as the following in the package log file:

vxdg: Error dg_01 may still be imported on ftsys9

ERROR: Function check_dg failed

This can happen if a package is running on a node which then fails before the package control

script can deport the disk group. In these cases, the host name of the node that had failed is still

written on the disk group header.

When the package starts up on another node in the cluster, a series of messages is printed in the

package log file

Follow the instructions in the messages to use the force import option (-C) to allow the current node

to import the disk group. Then deport the disk group, after which it can be used again by the

package. Example:

vxdg -tfC import dg_01

vxdg deport dg_01

The force import will clear the host name currently written on the disks in the disk group, after which

you can deport the disk group without error so it can then be imported by a package running on

a different node.

CAUTION: This force import procedure should only be used when you are certain the disk is not

currently being accessed by another node. If you force import a disk that is already being accessed

on another node, data corruption can result.

Package Movement Errors

These errors are similar to the system administration errors, but they are caused specifically by

errors in the control script for legacy packages. The best way to prevent these errors is to test your

package control script before putting your high availability application on line.

Adding a set -x statement in the second line of a legacy package control script will cause

additional details to be logged into the package log file, which can give you more information

about where your script may be failing.

Node and Network Failures

These failures cause Serviceguard to transfer control of a package to another node. This is the

normal action of Serviceguard, but you have to be able to recognize when a transfer has taken

place and decide to leave the cluster in its current condition or to restore it to its original condition.

Possible node failures can be caused by the following conditions:

• HPMC. This is a High Priority Machine Check, a system panic caused by a hardware error.

• TOC

• Panics

• Hangs

• Power failures

In the event of a TOC, a system dump is performed on the failed node and numerous messages

are also displayed on the console.

You can use the following commands to check the status of your network and subnets:

• netstat -in - to display LAN status and check to see if the package IP is stacked on the

LAN card.

• lanscan - to see if the LAN is on the primary interface or has switched to the standby interface.

324 Troubleshooting Your Cluster