Specifications

Device Mapper Multipath I/O Configurations
Transient
path failures
While running IO tests on Device Mapper Multipath devices, it is not uncommon
for actions on the SAN, for example, a server rebooting, to cause paths to
temporarily be reported as failed. In most cases, this will simply cause one path to
fail leaving other paths to send IOs down resulting in no observable failures other
than a small performance impact. In some cases, multiple paths can be reported
as failed leaving no paths working. This can cause an application, such as a file
system or database, to see IO errors. There has been much improvement in
Device Mapper Multipath and the vendor support to eliminate these failures.
However, at times, these can still be seen. To avoid these situations, consider
these actions:
1. Verify that the multipath configuration is set correctly per the instructions of
the disk array vendor.
2. Check the setting of the “failback feature. This feature determines how
quickly a path is reactivated after failing and being repaired. A setting of
immediate indicates to resume use of a path as soon as it comes back
online. A setting of an integer indicates the number of seconds after a path
comes back online to resume using it. A setting of 10 to 15 generally
provides sufficient settle time to avoid thrashing on the SAN.
3. Check the setting of the "no_path_retry" feature. This feature determines
what Device Mapper Multipath should do if all paths fail. We recommend a
setting of 10 to 15. This allows some ability to "ride out" temporary events
where all paths fail while still providing a reasonable recovery time. The
LifeKeeper DMMP kit will monitor IOs to the storage and if they are not
responded to within four minutes LifeKeeper will switch the resources to the
standby server. NOTE: LifeKeeper does not recommend setting "no_path_
retry" to "queue" since this will result in IOs that are not easily killed. The
only mechanism found to kill them is on newer versions of DM, the settings
of the device can be changed:
/sbin/dmsetup message -u 'DMid' 0 fail_if_no_path
This will temporarily change the setting for no_path_retry to fail causing any
outstanding IOs to fail. However, multipathd can reset no_path_retry to the
default at times. When the setting is changed to fail_if_no_path to flush
failed IOs, it should then be reset to its default prior to accessing the device
(manually or via LifeKeeper).
If "no_path_retry" is set to "queue" and a failure occurs, LifeKeeper will
switch the resources over to the standby server. However, LifeKeeper will
not kill these IOs. The recommended method to clear these IOs is through
a reboot but can also be done by an administrator using the dmsetup
command above. If the IOs are not cleared, then data corruption can occur
if/when the resources are taken out of service on the other server thereby
releasing the locks and allowing the "old" IOs to be issued.
102Configuration