Continentalclusters Version A.05.00 Release Notes, December 2004 (T2346-90006)

Continentalclusters Version A.05.00 Release Notes

Known Problems and Workarounds

Chapter 118

Applications hang when all PV links are down

What is the problem? When all PV links to a disk array used by a primary

or recovery package are down, the package applications accessing that

array will hang indefinitely. The applications will not detect an error

and cannot be killed. Even running cmhaltpkg will not stop the

application.

What is the workaround? The default behavior of LVM is to retry access

forever following a failure. There are a number of ways to recover from

this application hang problem:

• Reboot the node. The package will then move to another node in the

cluster.

• Fix the problem so that one or more PV Links is restored. The

application will then continue normally.

• It is also possible to change the default LVM behavior so that it will

not retry forever. There is an lvchange option that will establish a

timeout value which will cause LVM to return an error to the

application if the access to the array has failed and the timeout

period has expired. The command uses the following format:

lvchange -t <seconds> /dev/<vgname>/<lvname>

This command must be run from one host, on all logical volumes in

the package volume groups. A starting timeout value of 60 seconds is

suggested; the timeout must not be set to less than 60 seconds.

Once the lvchange -t command is run on a host, all other hosts will

automatically inherit the new timeout value for the logical volumes.

You can view the current timeout value by running the lvdisplay

command and checking the value listed for “IO Timeout (Seconds)”.

If the value is displayed as “default,” then LVM will use the default

behavior and retry forever.