Technical data

CHALLENGE/Onyx Diagnostic Road Map 2-5

To examine the messages stored in the compressed kernel core dump ﬁle, use the

uncompvm(1M) command. For example,

/usr/etc/uncompvm -h vmcore.N.comp

The –h option uncompresses only the header of the ﬁle vmcore.n.comp where the kernel

panic messages are stored. Panic messages are indicated by the string pb followed by the

message number. For example,

panic string: <0>PANIC: User requested vmcore dump (NMI)

kernel putbuf:

pb 7:

pb 8: <0>PANIC: User requested vmcore dump (NMI)

pb 9:

pb 10: Dumping to dev 0x2000011 at block 0, space: 0x10000 pages

The string pb indicates messages that were printed by the kernel routing putbuf and placed

in the circular message buffer.

You can also examine the text ﬁle /var/adm/SYSLOG and look for kernel putbuf messages.

For example:

Oct 18 16:38:47 2E:IRIS savecore: reboot after panic: <0>PANIC:

User requested vmcore dump (NMI)

Oct 18 16:38:47 2E:IRIS savecore: pb 0: 4>WARNING: STREAMS

interrupt block unavailable.

Because the panic messages are from a circular buffer, you will often see a wraparound

effect.

2.4 ASIC Error Detection

Some of the messages displayed when a system hangs contain clues to the origin of the

problem. At various points in the Everest system, the accuracy of the information being

transferred is checked. Different error-checking methods are used, depending on the

particular system interface. These methods include parity bits, error correction codes

(ECCs), time-outs, or a combination of several methods.

Errors are generally propagated from the point of origin on throughout the system. An

error is ﬂagged at both the sending and the receiving end of every interface it crosses, and

an error message is written to CPU-accessible registers. Eventually, the error is recognized

by one or more CPUs, which then take appropriate action. How soon the error is

recognized and whether or not the system can identify the origin of the error depends upon

the type of operation that generated the fault.

For example, in an IP19-based system, a memory read provides a high rate of success in

tracing the error back to the origin. If an IP19 issues a memory or PIO read, and that read

generates an error, the CPU takes a synchronous exception. Because the exception handler

is invoked so soon after the error, there is a good possibility that the cause of the error can

be determined.