VERITAS Volume Manager 4.1 Troubleshooting Guide

Recovery from Hardware Failure

Failures on RAID-5 Volumes

Chapter 124

If a volume without valid RAID-5 logs is started and the process is killed before the volume is

resynchronized, the result is an active volume with stale parity. For an example of the output

of the vxprint -ht command, see the following example for a stale RAID-5 volume:

V NAME RVG KSTATE STATE LENGTH READPOL PREFPLEX UTYPE

PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE

SD NAME PLEX DISK DISKOFFSLENGTH [COL/]OFF DEVICE MODE

SV NAME PLEX VOLNAME NVOLLAYRLENGTH [COL/]OFF AM/NM MODE

...

v r5vol - ENABLED NEEDSYNC204800 RAID - raid5

pl r5vol-01 r5vol ENABLED ACTIVE 204800 RAID 3/16 RW

sd disk01-01 r5vol-01disk01 0 102400 0/0 c2t9d0 ENA

sd disk02-01 r5vol-01disk02 0 102400 1/0 c2t10d0 dS

sd disk03-01 r5vol-01disk03 0 102400 2/0 c2t11d0 ENA...

This output lists the volume state as NEEDSYNC, indicating that the parity needs to be

resynchronized. The state could also have been SYNC, indicating that a synchronization was

attempted at start time and that a synchronization process should be doing the

synchronization. If no such process exists or if the volume is in the NEEDSYNC state, a

synchronization can be manually started by using the resync keyword for the vxvol

command. For example, to resynchronize the RAID-5 volume in the Figure 1-3, “Invalid

RAID-5 Volume,” use the following command:

# vxvol -g mydg resync r5vol

Parity is regenerated by issuing VOL_R5_RESYNC ioctls to the RAID-5 volume. The

resynchronization process starts at the beginning of the RAID-5 volume and resynchronizes a

region equal to the number of sectors speciﬁed by the -o

iosize

option. If the -o

iosize

option is not speciﬁed, the default maximum I/O size is used. The resync operation then

moves onto the next region until the entire length of the RAID-5 volume has been

resynchronized.

For larger volumes, parity regeneration can take a long time. It is possible that the system

could be shut down or crash before the operation is completed. In case of a system shutdown,

the progress of parity regeneration must be kept across reboots. Otherwise, the process has to

start all over again.

To avoid the restart process, parity regeneration is checkpointed.This means that the offset up

to which the parity has been regenerated is saved in the conﬁguration database. The -o

checkpt=

size

option controls how often the checkpoint is saved. If the option is not speciﬁed,

the default checkpoint size is used.

Because saving the checkpoint offset requires a transaction, making the checkpoint size too

small can extend the time required to regenerate parity. After a system reboot, a RAID-5

volume that has a checkpoint offset smaller than the volume length starts a parity

resynchronization at the checkpoint offset.