Deployment Guide
Performing basic steps
You can use any of the available options that are described in the previous sections to perform the basic steps comprising the fault
isolation methodology.
Gather fault information
When a fault occurs, gather as much information as possible. Doing so helps determine the correct action that is needed to remedy the
fault.
Begin by reviewing the reported fault:
• Is the fault related to an internal data path or an external data path?
• Is the fault related to a hardware component such as a disk drive module, controller module, or power supply unit?
By isolating the fault to one of the components within the storage system, you are able determine the necessary corrective action more
quickly.
Determine where the fault is occurring
When a fault occurs, the Module Fault LED illuminates. Check the LEDs on the back of the enclosure to narrow the fault to a CRU,
connection, or both. The LEDs also help you identify the location of a CRU reporting a fault.
Use the PowerVault Manager to verify any faults found while viewing the LEDs. If the LEDs cannot be viewed due to the location of the
system, use the PowerVault Manager to determine where the fault is occurring . This web-application provides you with a visual
representation of the system and where the fault is occurring. The PowerVault Manager also provides more detailed information about
CRUs, data, and faults.
Review the event logs
The event logs record all system events. Each event has a numeric code that identifies the type of event that occurred, and has one of
the following severities:
• Critical – A failure occurred that may cause a controller to shut down. Correct the problem immediately.
• Error – A failure occurred that may affect data integrity or system stability. Correct the problem as soon as possible.
• Warning – A problem occurred that may affect system stability, but not data integrity. Evaluate the problem and correct it if
necessary.
• Informational – A configuration or state change occurred, or a problem occurred that the system corrected. No immediate action is
required.
The event logs record all system events. Review the logs to identify fault and cause of the failure. For example, a host could lose
connectivity to a disk group if a user changes channel settings without taking the storage resources that are assigned to it into
consideration. In addition, the type of fault can help you isolate the problem to either hardware or software.
Isolate the fault
Occasionally, it might become necessary to isolate a fault. This is true with data paths, due to the number of components comprising the
data path. For example, if a host-side data error occurs, it could be caused by any of the components in the data path: controller module,
cable, or data host.
If the enclosure does not initialize
It may take up to two minutes for all enclosures to initialize.
If an enclosure does not initialize:
• Perform a rescan
• Power cycle the system
• Make sure that the power cord is properly connected, and check the power source to which it is connected
• Check the event log for errors
80
Troubleshooting and problem solving