Veritas Volume Manager 5.0 Troubleshooting Guide Guide (September 2006)
21Recovery from hardware failure
Failures on RAID-5 volumes
v r5vol - ENABLED NEEDSYNC 204800 RAID - raid5
pl r5vol-01 r5vol ENABLED ACTIVE 204800 RAID 3/16 RW
sd disk01-01 r5vol-01disk01 0 102400 0/0 c2t9d0 ENA
sd disk02-01 r5vol-01disk02 0 102400 1/0 c2t10d0 dS
sd disk03-01 r5vol-01disk03 0 102400 2/0 c2t11d0 ENA
...
This output lists the volume state as NEEDSYNC, indicating that the parity needs to be
resynchronized. The state could also have been SYNC, indicating that a synchronization
was attempted at start time and that a synchronization process should be doing the
synchronization. If no such process exists or if the volume is in the NEEDSYNC state, a
synchronization can be manually started by using the
resync keyword for the vxvol
command. For example, to resynchronize the RAID-5 volume in the figure “Invalid
RAID-5 volume” on page 24, use the following command:
# vxvol -g mydg resync r5vol
Parity is regenerated by issuing VOL_R5_RESYNC ioctls to the RAID-5 volume. The
resynchronization process starts at the beginning of the RAID-5 volume and
resynchronizes a region equal to the number of sectors specified by the -o iosize option.
If the -o iosize option is not specified, the default maximum I/O size is used. The
resync operation then moves onto the next region until the entire length of the RAID-5
volume has been resynchronized.
For larger volumes, parity regeneration can take a long time. It is possible that the system
could be shut down or crash before the operation is completed. In case of a system
shutdown, the progress of parity regeneration must be kept across reboots. Otherwise, the
process has to start all over again.
To avoid the restart process, parity regeneration is checkpointed. This means that the
offset up to which the parity has been regenerated is saved in the configuration database.
The -o checkpt=size option controls how often the checkpoint is saved. If the option is
not specified, the default checkpoint size is used.
Because saving the checkpoint offset requires a transaction, making the checkpoint size
too small can extend the time required to regenerate parity. After a system reboot, a
RAID-5 volume that has a checkpoint offset smaller than the volume length starts a parity
resynchronization at the checkpoint offset.
Log plex recovery
RAID-5 log plexes can become detached due to disk failures. These RAID-5 logs can be
reattached by using the att keyword for the
vxplex command. To reattach the failed
RAID-5 log plex, use the following command:
# vxplex -g mydg att r5vol r5vol-l1