HP StorageWorks HSG60 and HSG80 Array Controller and Array Controller Software Troubleshooting Guide (EK-G80TS-SA. C01, March 2005)

Alternative Controller Operations

270 HSG60 and HSG80 Array Controller and Array Controller Software Troubleshooting Guide

Handling host-configured units in error

Handling host-configured units requires additional maintenance if a unit is in

error. Highly functional host OSes, such as Tru64 UNIX using Logical Storage

Manager (LSM), provide redundancy for storage volumes. The host

systematically maintains a viable path to a unit through internal checks on a

periodic basis. In maintaining a viable path to a unit that is in error, the host might

not disengage a storage unit and resume error free operations. For example:

■ In several instances, while using LSM with Tru64 UNIX, the host does not

fail over to the backup controller in a timely manner, after a unit becomes

inoperative. These instances usually involve non-redundant storage on

HSG80 array controllers that are configured as redundant storage by LSM

through the host.

■ In other instances, array controllers do not discontinue attempts to perform

I/O to the unit. This causes continuous resets on the failed device's bus.

HSG80 array controllers, which are highly redundant, endeavor to successfully

complete read and write operations on host requests. If you deploy redundancy by

using host-based mirroring capabilities, with non-redundant storage containers

across multiple controllers, the controller is unable to determine the higher level

of redundancy provided for a specific unit.

If you use LSM with Tru64 UNIX for host-mirroring and mirror units that are

non-redundant storage containers, quicker error recovery of the array controller

occurs, allowing LSM to transfer the I/O requests to the host mirrored storage

units.

In examining unit error handling operations, the following changes have been

made to ACS V8.8-1:

■ If a device reports a hardware error, the error is reported to the unit if it is

related to the host I/O. If a second hardware error is reported by or against the

same physical device, the second hardware error is reported to the unit as an

E0_06, and the:

■ Redundant and normalized set is reduced, and the bad device is ejected to

the failed set.

■ Non-redundant containers are transitioned to an Inoperative state. If the

host retries a command, a check condition and SK=2 (not ready)

with ASC/Q of 04_00 is reported. The host might retry the I/O until it

suspends its retry attempts.