Veritas Volume Manager 4.1 Administrator's Guide (HP-UX 11i v3, February 2007)

Chapter 13, Administering Cluster Functionality
Cluster Initialization and Configuration
365
The cluster functionality of VxVM maintains global state information for each volume.
This enables VxVM to determine which volumes need to be recovered when a node
crashes. When a node leaves the cluster due to a crash or by some other means that is not
clean, VxVM determines which volumes may have writes that have not completed and
the master node resynchronizes these volumes. It can use dirty region logging (DRL) or
FastResync if these are active for any of the volumes.
Clean node shutdown must be used after, or in conjunction with, a procedure to halt all
cluster applications. Depending on the characteristics of the clustered application and its
shutdown procedure, a successful shutdown can require a lot of time (minutes to hours).
For instance, many applications have the concept of draining, where they accept no new
work, but complete any work in progress before exiting. This process can take a long time
if, for example, a long-running transaction is active.
When the VxVM shutdown procedure is invoked, it checks all volumes in all shared disk
groups on the node that is being shut down. The procedure then either continues with the
shutdown, or fails for one of the following reasons:
If all volumes in shared disk groups are closed, VxVM makes them unavailable to
applications. Because all nodes are informed that these volumes are closed on the
leaving node, no resynchronization is performed.
If any volume in a shared disk group is open, the shutdown operation in the kernel
waits until the volume is closed. There is no timeout checking in this operation.
Note Once shutdown succeeds, the node has left the cluster. It is not possible to access the
shared volumes until the node joins the cluster again.
Since shutdown can be a lengthy process, other reconfiguration can take place while
shutdown is in progress. Normally, the shutdown attempt is suspended until the other
reconfiguration completes. However, if it is already too far advanced, the shutdown may
complete first.
Node Abort
If a node does not leave a cluster cleanly, this is because it crashed or because some cluster
component made the node leave on an emergency basis. The ensuing cluster
reconfiguration calls the VxVM abort function. This procedure immediately attempts to
halt all access to shared volumes, although it does wait until pending I/O from or to the
disk completes.
I/O operations that have not yet been started are failed, and the shared volumes are
removed. Applications that were accessing the shared volumes therefore fail with errors.
After a node abort or crash, shared volumes must be recovered, either by a surviving node
or by a subsequent cluster restart, because it is very likely that there are unsynchronized
mirrors.