Specifications
6-2 Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
6.1 Controller Held in Reset, Causes, and
How to Recover
This section contains the following topics:
■ Section 6.1.1, “Reason Controllers Should be Failed” on page 6-2
■ Section 6.1.2, “Failing a Controller in Dual/Active Mode” on page 6-3
■ Section 6.1.3, “Replacing a Failed Controller” on page 6-4
The A3x00/A3500FC controllers do not detect controller failure and fail themselves.
The host system via the A3x00/A3500FC drivers or the user must make the decision
to fail a controller. Failing controllers is only possible in a system with redundant
controllers.
The redundant array controller architecture was developed on the premise that the
host system is best able to determine when a subsystem component has failed. A
controller is failed if it is held in a hardware reset state. A user should fail a
controller if there is cause for concern with regard to the controller’s hardware. If the
controller is failed (for example held in a hardware reset state) it will not be able to
access any data on the disk drives.
6.1.1 Reason Controllers Should be Failed
■ Unresponsive controller
An array controller may become unresponsive as a result of a controller host chip
failure, a loss of power to one of the controllers, or a controller hardware failure.
The controller should always be reset, and be given adequate time cycle to
through its reset logic before taking any further action.
An unresponsive controller’s typical symptoms include selection time-outs
and/or continuous command time-outs. The host should first attempt to revive
the controller from a possible hung state via a bus reset. If this fails, the host
should continue to access the configured LUN via the alternate controller, and fail
this controller.
■ Obtrusive Controller
An obtrusive array controller is one which interferes with the normal operation of
its alternate. This may be the result of a failing data path component of one of the
array controllers, an array controller drive side SCSI bus failure, or a failing disk
drive that has not been marked failed yet.