VERITAS Volume Manager 3.5 Troubleshooting Guide (September 2004)

Recovery from Hardware Failure
Failures on RAID-5 Volumes
Chapter 1
20
If hot-relocation is enabled at the time of a disk failure, system administrator
intervention is not required unless no suitable disk space is available for relocation.
Hot-relocation is triggered by the failure and the system administrator is notified of the
failure by electronic mail.
Hot relocation automatically attempts to relocate the subdisks of a failing RAID-5 plex.
After any relocation takes place, the hot-relocation daemon (vxrelocd) also initiate a
parity resynchronization.
In the case of a failing RAID-5 log plex, relocation occurs only if the log plex is mirrored;
the vxrelocd daemon then initiates a mirror resynchronization to recreate the RAID-5
log plex. If hot-relocation is disabled at the time of a failure, the system administrator
may need to initiate a resynchronization or recovery.
NOTE Following severe hardware failure of several disks or other related subsystems
underlying a RAID-5 plex, it may be impossible to recover the volume using the methods
described in this chapter. In this case, remove the volume, recreate it on hardware that is
functioning correctly, and restore the contents of the volume from a backup.
Parity Resynchronization
In most cases, a RAID-5 array does not have stale parity. Stale parity only occurs after
all RAID-5 log plexes for the RAID-5 volume have failed, and then only if there is a
system failure. Even if a RAID-5 volume has stale parity, it is usually repaired as part of
the volume start process.
If a volume without valid RAID-5 logs is started and the process is killed before the
volume is resynchronized, the result is an active volume with stale parity. For an
example of the output of the vxprint -ht command, see the following example for a
stale RAID-5 volume:
V NAME RVG KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFSLENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYRLENGTH [COL/]OFF AM/NM MODE
...
v r5vol - ENABLED NEEDSYNC204800 RAID - raid5
pl r5vol-01 r5vol ENABLED ACTIVE 204800 RAID 3/16 RW
sd disk01-01 r5vol-01disk01 0 102400 0/0 c2t9d0 ENA
sd disk02-01 r5vol-01disk02 0 102400 1/0 c2t10d0 dS
sd disk03-01 r5vol-01disk03 0 102400 2/0 c2t11d0 ENA...
This output lists the volume state as NEEDSYNC, indicating that the parity needs to be
resynchronized. The state could also have been SYNC, indicating that a synchronization
was attempted at start time and that a synchronization process should be doing the
synchronization. If no such process exists or if the volume is in the NEEDSYNC state, a
synchronization can be manually started by using the resync keyword for the vxvol
command. For example, to resynchronize the RAID-5 volume in the Figure 1-3, “Invalid
RAID-5 Volume,” use the following command:
# vxvol resync r5vol
Parity is regenerated by issuing VOL_R5_RESYNC ioctls to the RAID-5 volume. The
resynchronization process starts at the beginning of the RAID-5 volume and
resynchronizes a region equal to the number of sectors specified by the -o
iosize