HP StorageWorks HSG60 and HSG80 Array Controller and Array Controller Software Troubleshooting Guide (EK-G80TS-SA. C01, March 2005)

Alternative Controller Operations
270 HSG60 and HSG80 Array Controller and Array Controller Software Troubleshooting Guide
Handling host-configured units in error
Handling host-configured units requires additional maintenance if a unit is in
error. Highly functional host OSes, such as Tru64 UNIX using Logical Storage
Manager (LSM), provide redundancy for storage volumes. The host
systematically maintains a viable path to a unit through internal checks on a
periodic basis. In maintaining a viable path to a unit that is in error, the host might
not disengage a storage unit and resume error free operations. For example:
In several instances, while using LSM with Tru64 UNIX, the host does not
fail over to the backup controller in a timely manner, after a unit becomes
inoperative. These instances usually involve non-redundant storage on
HSG80 array controllers that are configured as redundant storage by LSM
through the host.
In other instances, array controllers do not discontinue attempts to perform
I/O to the unit. This causes continuous resets on the failed device's bus.
HSG80 array controllers, which are highly redundant, endeavor to successfully
complete read and write operations on host requests. If you deploy redundancy by
using host-based mirroring capabilities, with non-redundant storage containers
across multiple controllers, the controller is unable to determine the higher level
of redundancy provided for a specific unit.
If you use LSM with Tru64 UNIX for host-mirroring and mirror units that are
non-redundant storage containers, quicker error recovery of the array controller
occurs, allowing LSM to transfer the I/O requests to the host mirrored storage
units.
In examining unit error handling operations, the following changes have been
made to ACS V8.8-1:
If a device reports a hardware error, the error is reported to the unit if it is
related to the host I/O. If a second hardware error is reported by or against the
same physical device, the second hardware error is reported to the unit as an
E0_06, and the:
Redundant and normalized set is reduced, and the bad device is ejected to
the failed set.
Non-redundant containers are transitioned to an Inoperative state. If the
host retries a command, a check condition and SK=2 (not ready)
with ASC/Q of 04_00 is reported. The host might retry the I/O until it
suspends its retry attempts.