Continentalclusters Version A.05.00 Release Notes, December 2004 (T2346-90006)
Continentalclusters Version A.05.00 Release Notes
Known Problems and Workarounds
Chapter 118
Applications hang when all PV links are down
What is the problem? When all PV links to a disk array used by a primary
or recovery package are down, the package applications accessing that
array will hang indefinitely. The applications will not detect an error
and cannot be killed. Even running cmhaltpkg will not stop the
application.
What is the workaround? The default behavior of LVM is to retry access
forever following a failure. There are a number of ways to recover from
this application hang problem:
• Reboot the node. The package will then move to another node in the
cluster.
• Fix the problem so that one or more PV Links is restored. The
application will then continue normally.
• It is also possible to change the default LVM behavior so that it will
not retry forever. There is an lvchange option that will establish a
timeout value which will cause LVM to return an error to the
application if the access to the array has failed and the timeout
period has expired. The command uses the following format:
lvchange -t <seconds> /dev/<vgname>/<lvname>
This command must be run from one host, on all logical volumes in
the package volume groups. A starting timeout value of 60 seconds is
suggested; the timeout must not be set to less than 60 seconds.
Once the lvchange -t command is run on a host, all other hosts will
automatically inherit the new timeout value for the logical volumes.
You can view the current timeout value by running the lvdisplay
command and checking the value listed for “IO Timeout (Seconds)”.
If the value is displayed as “default,” then LVM will use the default
behavior and retry forever.