Technical data
CHALLENGE/Onyx Diagnostic Road Map 2-3
If the system is still hung, follow these steps to help isolate the problem:
1. Examine the serial port console, if available. Do the last messages look like normal
activity, or is the serial port console showing a panic or sitting at a DBG: prompt?
If there are no signs of a kernel panic or crash, type a few characters on the serial
console and see if they echo. If they echo, then the kernel is still ticking at interrupt
level; this is probably a software bug.
Corrective action: Generate a nonmaskable interrupt (NMI) core dump and file a bug
report. See Section 4.5.2, “Key Switch in the Manager Position,” for the System
Controller menu selections to create an NMI core dump.
2. Look at the power meter and open the first drive tray to see the disk LEDs. Notice if
the power meter is moving. (It updates about once per second). See if disk LEDs are
blinking.
Note: Distinguish between an LED that is stuck on “on” and one that is blinking.
Disk LEDs can sometimes blink very rapidly and might appear at first to be
stuck on.
If either the power meter or the disk LEDs show activity, some processes are still
running on the system. A disk with an LED almost constantly on could indicate heavy
swapping activity, causing what looks like a hang. If the system exhibits these
symptoms, suspect a software problem
Corrective action: Generate an NMI core dump and file a bug report.
3. If the system is on a network, log in to another host and try to ping the hung system
by using the command /usr/etc/ping. If ping indicates 100 percent packet loss, then the
kernel is not ticking at interrupt level.
Try to log into the frozen system using rlogin. If you can log in successfully, type date
at the shell prompt. Or, type ps -efl. If both of these commands run properly, the
hang is a software problem.
Corrective action: Generate an NMI core dump and file a bug report.
If the date and ps commands do not run properly, proceed to the next step.
4. If there was no response to the serial console, power meter, drive LEDs, or logging in
and running date or ps over the network, try to determine if processors are stuck
spinning in the kernel. Perform a front-panel NMI.
If one NMI causes the kernel to display
PANIC: User requested vmcore dump (NMI)
then processors are responding normally. This is probably a software problem.
Corrective action: File a bug report.
5. If there is no response to the first NMI, then issue a second NMI.
After you issue the second NMI the bootmaster processor will try to enter the PROM
power-on diagnostic monitor (POD). If POD is successfully entered, follow the
procedure in Section 5.6.4, “Using POD to Examine HARDWARE ERROR STATE
Messages,” to see if any hardware bits are set. If any hardware bits are set, then this is
probably a hardware failure.
Corrective action: Based on the HARDWARE ERROR STATE messages, swap the
appropriate hardware to locate and eliminate the problem.










