Specifications

ManualsBrandsADLINK Technology ManualsComputer equipmentPCI-8213

181

182

183

184

185

186

187

188

189

190

176 IBM Power 770 and 780 Technical Overview and Introduction

When a local or globally reported service request is made to the operating system, the

operating system diagnostic subsystem uses the Remote Management and Control

Subsystem (RMC) to relay error information to the Hardware Management Console. For

global events (platform unrecoverable errors, for example) the service processor will also

forward error notification of these events to the Hardware Management Console, providing a

redundant error-reporting path in case of errors in the RMC network.

The first occurrence of each failure type is recorded in the Manage Serviceable Events task

on the management console. This task then filters and maintains a history of duplicate reports

from other logical partitions on the service processor. It then looks at all active service event

requests, analyzes the failure to ascertain the root cause and, if enabled, initiates a call home

for service. This methodology ensures that all platform errors will be reported through at least

one functional path, ultimately resulting in a single notification for a single problem.

Extended error data

Extended error data (EED) is additional data that is collected either automatically at the time

of a failure or manually at a later time. The data collected is dependent on the invocation

method but includes information like firmware levels, operating system levels, additional fault

isolation register values, recoverable error threshold register values, system status, and any

other pertinent data.

The data is formatted and prepared for transmission back to IBM to assist the service support

organization with preparing a service action plan for the service representative or for

additional analysis.

System dump handling

In certain circumstances, an error might require a dump to be automatically or manually

created. In this event, it is off-loaded to the management console. Specific management

console information is included as part of the information that can optionally be sent to IBM

support for analysis. If additional information relating to the dump is required, or if it becomes

necessary to view the dump remotely, the management console dump record notifies the IBM

support center regarding on which management console the dump is located.

4.3.4 Notifying

After a Power Systems server has detected, diagnosed, and reported an error to an

appropriate aggregation point, it then takes steps to notify the client, and if necessary the IBM

support organization. Depending on the assessed severity of the error and support

agreement, this could range from a simple notification to having field service personnel

automatically dispatched to the client site with the correct replacement part.

Client Notify

When an event is important enough to report, but does not indicate the need for a repair

action or the need to call home to IBM service and support, it is classified as Client Notify.

Clients are notified because these events might be of interest to an administrator. The event

might be a symptom of an expected systemic change, such as a network reconfiguration or

failover testing of redundant power or cooling systems. Examples of these events include:

򐂰 Network events such as the loss of contact over a local area network (LAN)

򐂰 Environmental events such as ambient temperature warnings

򐂰 Events that need further examination by the client (although these events do not

necessarily require a part replacement or repair action)