PCI Error Handling Product Note HP-UX Servers and Workstations Fourth Edition Manufacturing Part Number : 5992-3799 March 2008 United States © Copyright 2001-2008 Hewlett-Packard Development Company LP. All rights reserved.
Legal Notices The information in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be held liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material.
Publishing History New editions of this manual will incorporate information that is new or has changed since the previous edition was published (minor typographical or formatting corrections do not result in the publication of a new edition). The publishing date, manufacturing part number, and edition number all change each time a new edition is published, providing unique identification for each edition.
Contents PCI Error Handling Product Note What is PCI Error Handling? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Accessing and Installing the PCI Error Handling Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Confirm PCI Error Handling is Supported . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Installing PCI Error Handling from the Software Depot . .
Contents 2
PCI Error Handling Product Note What is PCI Error Handling? The PCI Error Handling feature allows an HP-UX system to avoid a Machine Check Abort (MCA) or a High Priority Machine Check (HPMC), if a PCI error occurs (for example, a parity error). If a PCI error occurs on a bus without the PCI Error Handling feature installed, an MCA or an HPMC will occur, then the system will crash.
PCI Error Handling Product Note Accessing and Installing the PCI Error Handling Feature MP:CM> sysrev Utility Subsystem FW Revision Level: 15.22 | Cabinet #0 | Cabinet #1 | Cab #8 | Cab #9 | -----------------------+-----------------+-----------------+--------+--------+ | SYS FW | PDHC | | | Cell (slot 0) | 3.64 | 15.12 | 3.82 | 15.12 | | | Cell (slot 1) | 3.82 | 15.12 | 3.66 | 15.12 | | | Cell (slot 2) | 3.88 | 15.14 | 3.66 | 15.12 | | | Cell (slot 3) | 3.82 | 15.
PCI Error Handling Product Note Accessing and Installing the PCI Error Handling Feature PROGRAMMABLE HARDWARE : System Backplane : PCI-X Backplane : Core IO : GPM FM OSP ------- ------- ------- 1.002 1.002 1.002 LPM HS ------- ------- 2.000 1.000 Master Slave -------- ------- 2.010 2.010 LPM ------- PDHC ------- Cell 0 : 1.002 1.010 Cell 1 : 1.002 1.010 Cell 2 : 1.002 1.010 Cell 3 : 1.002 1.010 FIRMWARE: Core IO Master : Event Dict. : Slave : Event Dict. : A.
PCI Error Handling Product Note Accessing and Installing the PCI Error Handling Feature Cell 1 PDHC : A.003.027 Pri SFW : 23.001 (PA) Sec SFW : 23.001 (PA) Cell 2 PDHC : A.003.027 Pri SFW : 23.001 (PA) Sec SFW : 23.001 (PA) Cell 3 PDHC : Pri SFW : 23.001 (PA) Sec SFW : 23.001 NOTE A.003.027 The sysrev command output on some systems includes extra zeros in the system firmware version number. These zeros can be ignored. For example, 3.88 and 3.
PCI Error Handling Product Note New Error Messages for PCI Error Handling The patch required for the btlan driver is included with the PCIErrorHandling bundle. The patches required for the igelan and iether drivers must be downloaded and installed separately from the IT Resource Center at http://www.itrc.hp.com. — The iether driver requires patch PHNE 32199 or later. — The igelan driver requires patch PHNE 34037 or later. The latest version of the fcd driver (FibrChanl-01 bundle, version B.11.23.
PCI Error Handling Product Note New Error Messages for PCI Error Handling — Error messages for the btlan, igelan, and iether drivers appear in the console log only and do not get logged in syslog. — Error messages for the fcd and mpt drivers are logged in syslog and diaglog. — If an I/O card has multiple ports, error messages may not be reported for all of the ports on the card if the PCI Error Handling feature suspends the driver before the error is detected on all of the ports.
PCI Error Handling Product Note New Error Messages for PCI Error Handling -------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#% Thu Jan 24 MST 2008 21:50:49.540624 DISASTER Subsys:IETHER Loc:00000 <1002> 1000Base-T in path 6/0/0/1/0 Was moved to DEAD state due to a PCI error. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -------------------100BT/Gigabit Ethernet LAN/9000 Networking---------------@#% Thu Jan 24 MST 2008 21:50:49.
PCI Error Handling Product Note How to Online Recover from a PCI Error • When the driver is suspended due to a PCI error • When the driver is resumed after a PCI error • When the resume operation fails due to a PCI error • When a firmware update on the card associated with the driver fails due to a PCI error • When an initiator ID change fails due to a PCI error How to Online Recover from a PCI Error The olrad command and the Attention Button can be used to attempt online recovery from a PCI error
PCI Error Handling Product Note How to Online Recover from a PCI Error The following example shows how the PCI Error Handling feature is used to handle a PCI error involving the iether driver: The PCI Error Handling procedure detailed in this example may vary slightly from what you will experience, depending on the platform and IO card driver. NOTE A.
PCI Error Handling Product Note How to Online Recover from a PCI Error PCI-Express Slots Information ----------------------------Driver(s) Capable Slot Path Link Max Max Link Spd Link Link Width Spd Width Pwr Occu Susp OLAR OLD Mode 0-1-1-2 6/0/2/0/0/0 2.5 2.5 x8 x8 Off No N/A N/A N/A PCIe 0-1-1-3 6/0/4/0/0/0 2.5 2.5 x8 x1 On Yes N/A N/A N/A PCIe 0-1-1-4 6/0/5/0/0/0 2.5 2.5 x8 x8 Off No N/A N/A N/A PCIe 0-1-1-5 6/0/6/0/0/0 2.5 2.
PCI Error Handling Product Note How to Online Recover from a PCI Error 0-1-1-8 6/0/12/1 26880 133 133 On Yes Yes Yes N/A PCI-X PCI-X 0-1-1-9 6/0/10/1 26624 133 133 Off No N/A N/A N/A PCI-X PCI-X 0-1-1-10 6/0/9/1 26368 133 133 On Yes No Yes N/A PCI-X PCI-X 0-1-1-11 6/0/8/1 26112 133 133 Off No N/A N/A N/A PCI-X PCI-X PCI-Express Slots Information ----------------------------Driver(s) Capable Slot F.
PCI Error Handling Product Note PCI Error Handling Documentation 0-1-1-10 6/0/9/1 26368 133 133 On Yes No Yes N/A PCI-X PCI-X 0-1-1-11 6/0/8/1 26112 133 133 Off No N/A N/A N/A PCI-X PCI-X PCI-Express Slots Information ----------------------------Driver(s) Capable Slot Path Link Max Max Link Spd Link Link Width Spd Width Pwr Occu Susp OLAR OLD Mode 0-1-1-2 6/0/2/0/0/0 2.5 2.5 x8 x8 Off No N/A N/A N/A PCIe 0-1-1-3 6/0/4/0/0/0 2.5 2.
PCI Error Handling Product Note Known Problems • olrad manpage — after installing the PCI Error Handling feature, enter man olrad from the command line to view the olrad manpage that includes PCI Error Handling information. • Interface Card OL* Support Guide, September 2004, Manufacturing Part Number B2355-90862 — available at: http://docs.hp.com • Patch Management User Guide for HP-UX 11.x Systems, February 2007, Manufacturing Part Number 5991-6449 — available at: http://docs.hp.
PCI Error Handling Product Note Terms and Definitions Terms and Definitions HPMC High Priority Machine Check – Highest Priority interruption onPA-RISC based systems MCA Machine Check Abort – Highest Priority interruption on Itanium based systems Post Replace Operation - By issuing the olrad -R slot_id command after an I/O card is replaced, slot power is turned on, suspended drivers are resumed, driver scripts (post_replace) for the slot (slot_id) and affected slots (if any) are run, and the attention LED