Veritas Volume Manager 5.0 Troubleshooting Guide, HP-UX 11i v3, First Edition, May 2008

15Recovery from hardware failure

Failures on RAID-5 volumes

You can use the command vxreattach -c to check whether reattachment is

possible, without performing the operation. Instead, it displays the disk group

and disk media name where the disk can be reattached.

See the vxreattach(1M) manual page for more information on the

vxreattach

command.

Failures on RAID-5 volumes

Failures are seen in two varieties: system failures and disk failures. A system

failure means that the system has abruptly ceased to operate due to an

operating system panic or power failure. Disk failures imply that the data on

some number of disks has become unavailable due to a system failure (such as a

head crash, electronics failure on disk, or disk controller failure).

System failures

RAID-5 volumes are designed to remain available with a minimum of disk space

overhead, if there are disk failures. However, many forms of RAID-5 can have

data loss after a system failure. Data loss occurs because a system failure causes

the data and parity in the RAID-5 volume to become unsynchronized. Loss of

synchronization occurs because the status of writes that were outstanding at

the time of the failure cannot be determined.

If a loss of sync occurs while a RAID-5 volume is being accessed, the volume is

described as having stale parity. The parity must then be reconstructed by

reading all the non-parity columns within each stripe, recalculating the parity,

and writing out the parity stripe unit in the stripe. This must be done for every

stripe in the volume, so it can take a long time to complete.

Caution: While the resynchronization of a RAID-5 volume without log plexes is

being performed, any failure of a disk within the volume causes its data to be

lost.

Besides the vulnerability to failure, the resynchronization process can tax the

system resources and slow down system operation.

RAID-5 logs reduce the damage that can be caused by system failures, because

they maintain a copy of the data being written at the time of the failure. The

process of resynchronization consists of reading that data and parity from the

logs and writing it to the appropriate areas of the RAID-5 volume. This greatly

reduces the amount of time needed for a resynchronization of data and parity. It

also means that the volume never becomes truly stale. The data and parity for

all stripes in the volume are known at all times, so the failure of a single disk

cannot result in the loss of the data within the volume.