Technical data

2-4 Diagnostic Procedures

If no error bits are set, then it is probably a software problem.

Corrective action: File a bug report.

6. If there is no response to the second NMI, use the procedure in section Section 2.5.5,

“Procedure to Cause a Hung System to Enter POD Mode,” to try to reset into POD.

If there are no error bits set, then it is still probably a software problem, with

corruption of kernel memory. Entering POD depends on a few words of memory

being correct.

Corrective action: File a bug report.

Note: One hardware problem that can look like a software problem under the guidelines

in Table 2-1 is if some (but not all) processors are not executing instructions

normally, but at least one CPU continues to execute. If you suspect this may be the

case, see Section 2.5.4, “Using a Debug Kernel to Find System Hangs.”

2.2.2 What to Do if the System Has Been Reset or Rebooted

If the system has already been reset or rebooted by the time you examine it, there is nothing

remaining to look at. Ask the customer to allow you to examine the system the next time it

hangs.

If the customer cannot wait for you to arrive after the system hangs, ask the customer to

use the System Controller to issue a nonmaskable interrupt (NMI), which should create a

system core dump.

If the system dumps core after the customer issues an NMI, then you should suspect a

software problem, in particular with the operating system (IRIX). If the system doesn’t

respond, then the hardware may be at fault. If there is a hardware problem, the

/var/adm/SYSLOG ﬁle may contain kernel messages preceding the hang that should be

included in any bug reports.

2.3 Diagnosing a System Panic

When the IRIX kernel panics, it displays one of several error messages and then stops

running purposefully. There are both hardware and software causes for kernel panics. To

determine the cause of the panic, collect the messages that the kernel printed at panic time

and classify them.

At panic time, messages are displayed on the system console; this is useful only if the

system console is set to the serial port console. The kernel then attempts a core dump, in

which the messages are stored. At boot time, the utility savecore(1M) copies the panic

messages into the ﬁle /var/adm/SYSLOG and stores the core dump in a ﬁle called

/var/adm/crash/vmcore.N.comp. In the actual ﬁlename, N is a number that identiﬁes each

particular core dump if there is more than one dump ﬁle in the crash directory. You can

examine panic messages in either SYSLOG or vmcore.N.comp ﬁles.