HP-UX 11i v3 Native Multi-Pathing for Mass Storage (August 2012)
9
Path recovery policies
When a disk device no longer responds to I/O operations sent through a lunpath for a configurable
period of time (path_fail_secs attribute), this lunpath is considered offline, and is no longer used
for I/O transfer. Prior to the March 2008 release of HP-UX 11i v3, the system automatically monitors
the lunpath by periodically sending inquiry requests to the device through it, and declares the lunpath
back online when the inquiry requests succeed.
Starting with the March 2008 release of HP-UX 11i v3, you have more control on automatic path
recovery. You can configure path ping policies and path recovery policies.
The following path ping policies can be configured:
• None — No connectivity test is performed when a lunpath is declared offline.
• Basic — The connectivity of the lunpath is tested by periodically sending INQUIRY SCSI commands
to the device through the lunpath.
• Extended — The connectivity of the lunpath is tested by periodically sending INQUIRY and TUR
(Test Unit Ready) SCSI commands through the lunpath.
The following path recovery policies can be configured:
• Immediate — The lunpath is considered back online upon a successful connectivity test (ping).
• Time-based threshold — The lunpath is considered back online upon sustained and consecutive
successful connectivity tests during a configured period of time.
• Count-based threshold — The lunpath is considered back online after a specified number of
consecutive successful connectivity tests.
LUN failure management
Managing loss of accessibility to a LUN
When all the lunpaths to a LUN are taken offline, the LUN becomes inaccessible. The mass storage
subsystem still accepts I/O operations to this LUN during a transient grace period. This
mechanism gives some time for the LUN to recover and shields applications from SAN transient
conditions that may cause a temporary loss of accessibility to the LUN. If at least one lunpath
becomes active before the end of the grace period, incoming I/O operations can flow again. After
this grace period expires, pending I/O operations and further incoming I/O operations to this LUN
are returned with a failure indication until a lunpath comes back online. Furthermore, the stack
provides more flexibility to the customer by allowing administrators to tune the value of the transient
grace period. This allows the mass storage subsystem to provide high availability customized to the
local SAN.
Managing authentication failure
Whenever a LUN is opened, the SCSI stack authenticates that the LUN represented by the DSF is still
the same device. The goal is to prevent data written on the physical LUN device from being corrupted
by mistake when LUN devices are swapped. The authentication consists of getting the WWID of the
LUN from the LUN device and comparing it with the WWID associated with the DSF at the time of the
DSF creation.
If any change in LUN behavior is detected, a LUN authentication failure is printed on the console. To
prevent data corruption, any pending I/O operation to the LUN is failed and no further I/O operation
can be sent to the LUN until the LUN is re-authenticated properly or until the LUN identifier is changed
by the administrator. System administrators can use
scsimgr replace_wwid to change the identifier
associated with a given LUN DSF (See the
scsimgr SCSI Management and Diagnostics Utility white
paper).