HP-UX 11i v3 Native Multi-Pathing for Mass Storage (August 2012)

Path recovery policies

When a disk device no longer responds to I/O operations sent through a lunpath for a configurable

period of time (path_fail_secs attribute), this lunpath is considered offline, and is no longer used

for I/O transfer. Prior to the March 2008 release of HP-UX 11i v3, the system automatically monitors

the lunpath by periodically sending inquiry requests to the device through it, and declares the lunpath

back online when the inquiry requests succeed.

Starting with the March 2008 release of HP-UX 11i v3, you have more control on automatic path

recovery. You can configure path ping policies and path recovery policies.

The following path ping policies can be configured:

• None — No connectivity test is performed when a lunpath is declared offline.

• Basic — The connectivity of the lunpath is tested by periodically sending INQUIRY SCSI commands

to the device through the lunpath.

• Extended — The connectivity of the lunpath is tested by periodically sending INQUIRY and TUR

(Test Unit Ready) SCSI commands through the lunpath.

The following path recovery policies can be configured:

• Immediate — The lunpath is considered back online upon a successful connectivity test (ping).

• Time-based threshold — The lunpath is considered back online upon sustained and consecutive

successful connectivity tests during a configured period of time.

• Count-based threshold — The lunpath is considered back online after a specified number of

consecutive successful connectivity tests.

LUN failure management

Managing loss of accessibility to a LUN

When all the lunpaths to a LUN are taken offline, the LUN becomes inaccessible. The mass storage

subsystem still accepts I/O operations to this LUN during a transient grace period. This

mechanism gives some time for the LUN to recover and shields applications from SAN transient

conditions that may cause a temporary loss of accessibility to the LUN. If at least one lunpath

becomes active before the end of the grace period, incoming I/O operations can flow again. After

this grace period expires, pending I/O operations and further incoming I/O operations to this LUN

are returned with a failure indication until a lunpath comes back online. Furthermore, the stack

provides more flexibility to the customer by allowing administrators to tune the value of the transient

grace period. This allows the mass storage subsystem to provide high availability customized to the

local SAN.

Managing authentication failure

Whenever a LUN is opened, the SCSI stack authenticates that the LUN represented by the DSF is still

the same device. The goal is to prevent data written on the physical LUN device from being corrupted

by mistake when LUN devices are swapped. The authentication consists of getting the WWID of the

LUN from the LUN device and comparing it with the WWID associated with the DSF at the time of the

DSF creation.

If any change in LUN behavior is detected, a LUN authentication failure is printed on the console. To

prevent data corruption, any pending I/O operation to the LUN is failed and no further I/O operation

can be sent to the LUN until the LUN is re-authenticated properly or until the LUN identifier is changed

by the administrator. System administrators can use

scsimgr replace_wwid to change the identifier

associated with a given LUN DSF (See the

scsimgr SCSI Management and Diagnostics Utility white

paper).