Specifications
176 IBM Power 770 and 780 Technical Overview and Introduction
When a local or globally reported service request is made to the operating system, the
operating system diagnostic subsystem uses the Remote Management and Control
Subsystem (RMC) to relay error information to the Hardware Management Console. For
global events (platform unrecoverable errors, for example) the service processor will also
forward error notification of these events to the Hardware Management Console, providing a
redundant error-reporting path in case of errors in the RMC network.
The first occurrence of each failure type is recorded in the Manage Serviceable Events task
on the management console. This task then filters and maintains a history of duplicate reports
from other logical partitions on the service processor. It then looks at all active service event
requests, analyzes the failure to ascertain the root cause and, if enabled, initiates a call home
for service. This methodology ensures that all platform errors will be reported through at least
one functional path, ultimately resulting in a single notification for a single problem.
Extended error data
Extended error data (EED) is additional data that is collected either automatically at the time
of a failure or manually at a later time. The data collected is dependent on the invocation
method but includes information like firmware levels, operating system levels, additional fault
isolation register values, recoverable error threshold register values, system status, and any
other pertinent data.
The data is formatted and prepared for transmission back to IBM to assist the service support
organization with preparing a service action plan for the service representative or for
additional analysis.
System dump handling
In certain circumstances, an error might require a dump to be automatically or manually
created. In this event, it is off-loaded to the management console. Specific management
console information is included as part of the information that can optionally be sent to IBM
support for analysis. If additional information relating to the dump is required, or if it becomes
necessary to view the dump remotely, the management console dump record notifies the IBM
support center regarding on which management console the dump is located.
4.3.4 Notifying
After a Power Systems server has detected, diagnosed, and reported an error to an
appropriate aggregation point, it then takes steps to notify the client, and if necessary the IBM
support organization. Depending on the assessed severity of the error and support
agreement, this could range from a simple notification to having field service personnel
automatically dispatched to the client site with the correct replacement part.
Client Notify
When an event is important enough to report, but does not indicate the need for a repair
action or the need to call home to IBM service and support, it is classified as Client Notify.
Clients are notified because these events might be of interest to an administrator. The event
might be a symptom of an expected systemic change, such as a network reconfiguration or
failover testing of redundant power or cooling systems. Examples of these events include:
Network events such as the loss of contact over a local area network (LAN)
Environmental events such as ambient temperature warnings
Events that need further examination by the client (although these events do not
necessarily require a part replacement or repair action)