VERITAS Volume Manager 3.5 Troubleshooting Guide (September 2004)

Recovery from Hardware Failure
Failures on RAID-5 Volumes
Chapter 1
21
option. If the -o
iosize
option is not specified, the default maximum I/O size is used.
The resync operation then moves onto the next region until the entire length of the
RAID-5 volume has been resynchronized.
For larger volumes, parity regeneration can take a long time. It is possible that the
system could be shut down or crash before the operation is completed. In case of a
system shutdown, the progress of parity regeneration must be kept across reboots.
Otherwise, the process has to start all over again.
To avoid the restart process, parity regeneration is checkpointed.This means that the
offset up to which the parity has been regenerated is saved in the configuration
database. The -o checkpt=
size
option controls how often the checkpoint is saved. If the
option is not specified, the default checkpoint size is used.
Because saving the checkpoint offset requires a transaction, making the checkpoint size
too small can extend the time required to regenerate parity. After a system reboot, a
RAID-5 volume that has a checkpoint offset smaller than the volume length starts a
parity resynchronization at the checkpoint offset.
Log Plex Recovery
RAID-5 log plexes can become detached due to disk failures. These RAID-5 logs can be
reattached by using the att keyword for the vxplex command. To reattach the failed
RAID-5 log plex, use the following command:
# vxplex att r5vol r5vol-l1
Stale Subdisk Recovery
Stale subdisk recovery is usually done at volume start time. However, the process doing
the recovery can crash, or the volume may be started with an option such as -o
delayrecover that prevents subdisk recovery. In addition, the disk on which the
subdisk resides can be replaced without recovery operations being performed. In such
cases, you can perform subdisk recovery using the vxvol recover command. For
example, to recover the stale subdisk in the RAID-5 volume shown in Figure 1-3,
“Invalid RAID-5 Volume,”, use the following command:
# vxvol recover r5vol disk01-00
A RAID-5 volume that has multiple stale subdisks can be recovered in one operation. To
recover multiple stale subdisks, use the vxvol recover command on the volume, as
follows:
# vxvol recover r5vol
Recovery After Moving RAID-5 Subdisks
When RAID-5 subdisks are moved and replaced, the new subdisks are marked as
STALE in anticipation of recovery. If the volume is active, the vxsd command may be
used to recover the volume. If the volume is not active, it is recovered when it is next
started. The RAID-5 volume is degraded for the duration of the recovery operation.
Any failure in the stripes involved in the move makes the volume unusable. The RAID-5
volume can also become invalid if its parity becomes stale. To avoid this occurring, vxsd
does not allow a subdisk move in the following situations:
a stale subdisk occupies any of the same stripes as the subdisk being moved