LVM Online Disk Replacement (LVM OLR)

9
When replacing a disk requires halting applications and
restoring data from a backup
Replacing a disk requires halting applications and restoring data from a backup under the
following circumstances:
v If the disk being replaced contains any unmirrored logical volumes
v If any data in the mirrored logical volumes on the disk being replaced has been
compromised due to simultaneous disk failures
If any of the logical volumes on the disk being replaced are not mirrored, the disk being
replaced held the only copy of the data and by definition there are no mirror copies
elsewhere. So replacing the disk requires halting applications and filesystems using the
disk and restoring the data from backup.
If all the logical volumes on the disk being replaced are mirrored and none have been
compromised due to simultaneous disk failures, it is not necessary to halt applications using
the logical volumes and there is no need to restore any data to the logical volumes from a
backup. Once the disk is replaced and attached to the volume group again, LVM will
automatically resynchronize the data on the disk.
The data in a mirrored logical volume is compromised when no remaining available disk in
the volume group contains a non-stale copy of the data. Prior to replacing a disk it is
important to ensure that for each extent in each logical volume on the disk being replaced
there is a non-stale copy on a different available disk. The lvdisplay(1M) -v command can
be employed to display each of the logical volumes and the state of all the extents within
them. If there is any extent that does not have a non-stale copy on some other available
disk, halt any applications using the data and restore the data from backup after replacing
the disk.
For example, given a logical volume is mirrored across two disks: A and B. If disk A
contains some stale extents, and disk B fails, the logical volume is compromised. The
cause was a partial failure of disk A (perhaps a few I/O requests timed-out) coincident with
a complete failure of B, before A could be re-synced from B. Displaying the logical
volume shows that although disk A is available, it contains some stale data. Although disk
B may appear to have a non-stale copy of all the data, that disk is down so the copy there is
unavailable. Then if disk B is replaced, the data on the replacement disk cannot be
automatically synced with the copy from disk A because the stale data there is unavailable.
Under these circumstances applications using the logical volume must be halted prior to
replacing the disk, and after the disk is replaced, it must be restored from backup.
It is important to note that simultaneous disk failures are a rare event provided that the disks
and I/O hardware in the volume group are properly isolated and promptly replaced at the
first sign of failure. Essentially a mirrored volume that has only a single available copy of
any of its data has already weathered one or more failures and is susceptible to being
compromised by just one subsequent disk failure.