Technical data

CHALLENGE/Onyx Diagnostic Road Map 2-3

If the system is still hung, follow these steps to help isolate the problem:

1. Examine the serial port console, if available. Do the last messages look like normal

activity, or is the serial port console showing a panic or sitting at a DBG: prompt?

If there are no signs of a kernel panic or crash, type a few characters on the serial

console and see if they echo. If they echo, then the kernel is still ticking at interrupt

level; this is probably a software bug.

Corrective action: Generate a nonmaskable interrupt (NMI) core dump and ﬁle a bug

report. See Section 4.5.2, “Key Switch in the Manager Position,” for the System

Controller menu selections to create an NMI core dump.

2. Look at the power meter and open the ﬁrst drive tray to see the disk LEDs. Notice if

the power meter is moving. (It updates about once per second). See if disk LEDs are

blinking.

Note: Distinguish between an LED that is stuck on “on” and one that is blinking.

Disk LEDs can sometimes blink very rapidly and might appear at ﬁrst to be

stuck on.

If either the power meter or the disk LEDs show activity, some processes are still

running on the system. A disk with an LED almost constantly on could indicate heavy

swapping activity, causing what looks like a hang. If the system exhibits these

symptoms, suspect a software problem

Corrective action: Generate an NMI core dump and ﬁle a bug report.

3. If the system is on a network, log in to another host and try to ping the hung system

by using the command /usr/etc/ping. If ping indicates 100 percent packet loss, then the

kernel is not ticking at interrupt level.

Try to log into the frozen system using rlogin. If you can log in successfully, type date

at the shell prompt. Or, type ps -efl. If both of these commands run properly, the

hang is a software problem.

Corrective action: Generate an NMI core dump and ﬁle a bug report.

If the date and ps commands do not run properly, proceed to the next step.

4. If there was no response to the serial console, power meter, drive LEDs, or logging in

and running date or ps over the network, try to determine if processors are stuck

spinning in the kernel. Perform a front-panel NMI.

If one NMI causes the kernel to display

PANIC: User requested vmcore dump (NMI)

then processors are responding normally. This is probably a software problem.

Corrective action: File a bug report.

5. If there is no response to the ﬁrst NMI, then issue a second NMI.

After you issue the second NMI the bootmaster processor will try to enter the PROM

power-on diagnostic monitor (POD). If POD is successfully entered, follow the

procedure in Section 5.6.4, “Using POD to Examine HARDWARE ERROR STATE

Messages,” to see if any hardware bits are set. If any hardware bits are set, then this is

probably a hardware failure.

Corrective action: Based on the HARDWARE ERROR STATE messages, swap the

appropriate hardware to locate and eliminate the problem.