Technical data
2-4 Diagnostic Procedures
If no error bits are set, then it is probably a software problem.
Corrective action: File a bug report.
6. If there is no response to the second NMI, use the procedure in section Section 2.5.5,
“Procedure to Cause a Hung System to Enter POD Mode,” to try to reset into POD.
If there are no error bits set, then it is still probably a software problem, with
corruption of kernel memory. Entering POD depends on a few words of memory
being correct.
Corrective action: File a bug report.
Note: One hardware problem that can look like a software problem under the guidelines
in Table 2-1 is if some (but not all) processors are not executing instructions
normally, but at least one CPU continues to execute. If you suspect this may be the
case, see Section 2.5.4, “Using a Debug Kernel to Find System Hangs.”
2.2.2 What to Do if the System Has Been Reset or Rebooted
If the system has already been reset or rebooted by the time you examine it, there is nothing
remaining to look at. Ask the customer to allow you to examine the system the next time it
hangs.
If the customer cannot wait for you to arrive after the system hangs, ask the customer to
use the System Controller to issue a nonmaskable interrupt (NMI), which should create a
system core dump.
If the system dumps core after the customer issues an NMI, then you should suspect a
software problem, in particular with the operating system (IRIX). If the system doesn’t
respond, then the hardware may be at fault. If there is a hardware problem, the
/var/adm/SYSLOG file may contain kernel messages preceding the hang that should be
included in any bug reports.
2.3 Diagnosing a System Panic
When the IRIX kernel panics, it displays one of several error messages and then stops
running purposefully. There are both hardware and software causes for kernel panics. To
determine the cause of the panic, collect the messages that the kernel printed at panic time
and classify them.
At panic time, messages are displayed on the system console; this is useful only if the
system console is set to the serial port console. The kernel then attempts a core dump, in
which the messages are stored. At boot time, the utility savecore(1M) copies the panic
messages into the file /var/adm/SYSLOG and stores the core dump in a file called
/var/adm/crash/vmcore.N.comp. In the actual filename, N is a number that identifies each
particular core dump if there is more than one dump file in the crash directory. You can
examine panic messages in either SYSLOG or vmcore.N.comp files.










