HP-UX 11i v3 Native Multi-Pathing for Mass Storage (August 2012)

8
messages in the system log, then the SCSI subsystem resorts to path failover and recovery mechanisms
to provide applications with continuous access to the LUN end devices; only when certain critical
errors happen, is administrator intervention required.
In summary, after detecting a SCSI component error, the operating system reports the error to system
administrators and offers a palette of pro-active recovery actions: automatic path failover or dynamic
replacement of the failing component.
Path error reporting
HP-UX 11i v3 offers the following new and enhanced mechanisms to report failures on SCSI
components. The goal is to assist administrators in performing quick and efficient diagnostic to take
the most appropriate action.
Error messages -— A comprehensive set of error messages of various severity levels are used to
report a wide range of errors. These messages can be monitored in syslog and STM.
Statistics A detailed set of statistics is available for each SCSI component to help troubleshooting,
and to quickly identify a faulting component. Administrators can use scsimgr to display these
statistics.
EVM events The mass storage subsystem generates events to which other modules in the kernel
or user space can subscribe, to get notified about changes on a LUN and lunpath properties. The
SCSI stack monitors every LUN and every lunpath availability change. It also monitors LUN property
changes such as LUN size.
I/O error triggered eventsThe mass storage subsystem also reports failures on the lunpaths and
LUN upon detection of certain I/O errors.
Path failover
When a lunpath goes offline, I/O operations issued on that lunpath fail. The policy of how the mass
storage subsystem deals with this scenario is dependent upon the following factors:
Path bound I/O operations
User applications can request the mass storage subsystem to issue I/O operations on a specific
lunpath. Such I/O operations are called path bound I/O operations. When a path bound I/O fails,
the SCSI stack retries it a certain number of times on the same lunpath before failing it back to the
upper layer (for example applications, file systems, volume managers). There is no path failover.
I/O operations sent to a LUN using the path lock down load balancing policy are path bound I/O
operations.
I/O operations not bound to a path
Path failover is applied to I/O operations that are not bound to a lunpath . When an not bound to
a path I/O operation fails, if the I/O can be retried (see I/O retry policy below), the I/O is failed
over the next lunpath selected by the LUN load balancing policy path selection algorithm.
I/O retry policy
The mass storage subsystem retries a failing I/O operation a certain number of times before
returning failure to the application using one of the following retry policies:
Time basedThe mass storage subsystem retries the I/O operation within a certain time interval
which is either set by upper layer modules such as the volume manager or the file system, or
determined by a default LUN attribute. For disk LUNs, the esd_secs LUN attribute holds the time
credit for an I/O operation across different retries. For tape LUNs, the read_secs attribute and
write_secs attribute hold the read and write time credits.
Count basedThe I/O operation is retried less than a high-water mark threshold . For disk
LUNs, I/O operations can be retried indefinitely (if disk LUN infinite_retries_enable
attribute is set), or a finite number of times (corresponding to the disk LUN max_retries
attribute value).