Specifications

ManualsBrandsADLINK Technology ManualsComputer equipmentPCI-8213

181

182

183

184

185

186

187

188

189

190

Chapter 4. Continuous availability and manageability 175

result is stored in system NVRAM. Error log analysis (ELA) can be used to display the failure

cause and the physical location of the failing hardware.

With the integrated service processor, the system has the ability to automatically send out an

alert through a phone line to a pager, or call for service in the event of a critical system failure.

A hardware fault also illuminates the amber system fault LED located on the system unit to

alert the user of an internal hardware problem.

On POWER7 processor-based servers, hardware and software failures are recorded in the

system log. When a management console is attached, an ELA routine analyzes the error,

forwards the event to the Service Focal Point (SFP) application running on the management

console, and has the capability to notify the system administrator that it has isolated a likely

cause of the system problem. The service processor event log also records unrecoverable

checkstop conditions, forwards them to the Service Focal Point (SFP) application, and

notifies the system administrator. After the information is logged in the SFP application, if the

system is properly configured, a call-home service request is initiated and the pertinent failure

data with service parts information and part locations is sent to the IBM service

organization.This information will also contain the client contact information as defined in the

Electronic Service Agent (ESA) guided set-up wizard.

Error logging and analysis

When the root cause of an error has been identified by a fault isolation component, an error

log entry is created with basic data such as:

򐂰 An error code uniquely describing the error event

򐂰 The location of the failing component

򐂰 The part number of the component to be replaced, including pertinent data such as

engineering and manufacturing levels

򐂰 Return codes

򐂰 Resource identifiers

򐂰 FFDC data

Data containing information about the effect that the repair will have on the system is also

included. Error log routines in the operating system and FSP can then use this information

and decide whether the fault is a call home candidate. If the fault requires support

intervention, then a call will be placed with service and support and a notifcation sent to the

contact defined in the ESA guided set-up wizard

Remote support

The Remote Management and Control (RMC) subsystem is delivered as part of the base

operating system, including the operating system running on the Hardware Management

Console. RMC provides a secure transport mechanism across the LAN interface between the

operating system and the Hardware Management Console and is used by the operating

system diagnostic application for transmitting error information. It performs a number of other

functions also, but these are not used for the service infrastructure.

Service Focal Point

A critical requirement in a logically partitioned environment is to ensure that errors are not lost

before being reported for service, and that an error should only be reported once, regardless

of how many logical partitions experience the potential effect of the error. The Manage

Serviceable Events task on the management console is responsible for aggregating duplicate

error reports, and ensures that all errors are recorded for review and management.