VERITAS Volume Manager 4.1 Troubleshooting Guide

Recovery from Hardware Failure
Failures on RAID-5 Volumes
Chapter 120
If a loss of sync occurs while a RAID-5 volume is being accessed, the volume is described as
having stale parity. The parity must then be reconstructed by reading all the non-parity
columns within each stripe, recalculating the parity, and writing out the parity stripe unit in
the stripe. This must be done for every stripe in the volume, so it can take a long time to
complete.
CAUTION While the resynchronization of a RAID-5 volume without log plexes is being
performed, any failure of a disk within the volume causes its data to be lost.
Besides the vulnerability to failure, the resynchronization process can tax the system
resources and slow down system operation.
RAID-5 logs reduce the damage that can be caused by system failures, because they maintain
a copy of the data being written at the time of the failure. The process of resynchronization
consists of reading that data and parity from the logs and writing it to the appropriate areas
of the RAID-5 volume. This greatly reduces the amount of time needed for a
resynchronization of data and parity. It also means that the volume never becomes truly stale.
The data and parity for all stripes in the volume are known at all times, so the failure of a
single disk cannot result in the loss of the data within the volume.
Disk Failures
An uncorrectable I/O error occurs when disk failure, cabling or other problems cause the data
on a disk to become unavailable. For a RAID-5 volume, this means that a subdisk becomes
unavailable. The subdisk cannot be used to hold data and is considered stale and detached. If
the underlying disk becomes available or is replaced, the subdisk is still considered stale and
is not used.
If an attempt is made to read data contained on a stale subdisk, the data is reconstructed
from data on all other stripe units in the stripe. This operation is called a reconstructing-read.
This is a more expensive operation than simply reading the data and can result in degraded
read performance. When a RAID-5 volume has stale subdisks, it is considered to be in
degraded mode.
A RAID-5 volume in degraded mode can be recognized from the output of the vxprint -ht
command as shown in the following display:
V NAME RVG/VSET/COKSTATESTATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
...