Specifications

6
Troubleshooting
128
SCM resets abruptly forcing the
LMP to reboot. There are no
associated hot-swap events.
Possible causes and steps to take are as follows:
1. The SCM LMP hung, most likely because a process running on the LMP got into an error state that
consumed all CPU cycles. This prevents the Shelf Manager from getting enough CPU cycles to strobe the
IPMC Watchdog. The watchdog timer expires and resets the LMP. In this case a WATCHDOG event from
the reset SCM should be found in the Domain Event Log of the active SCM.
Because it is difficult to identify the process that causes the LMP reset, periodically monitor the CPU usage
of the processes running on the LMP to study the CPU usage pattern on that SCM while it is running after
the reset. The linux 'top' utility is one way to get this data. If there is a process with occasional CPU usage
spikes, this might be chronic condition that keeps happening but usually does not cause harm. For some
reason that process' CPU usage spiked more than usual when the issue occurred, and that caused the
WDT to reset. By collecting data even after the SCM has reset, it is possible to determine if such a process
does exist. If so, Radisys technical support can do a more focused investigation on that specific process.
2. The Shelf Manager (ShMS) that strobes the IPMI watchdog crashed and could not restart within 60
seconds.
If the ShMS has a catastrophic fault and crashes, it is immediately restarted by a keepalive process. The
keepalive process first resets the IPMI Watchdog to 60 seconds, then copies over all logs and process data
in persistent storage at /var/lib/shmgr/diagnostics and then restarts the Shelf Manager. The newly started
Shelf Manager then restarts the IPMI watchdog strobe process. If for some reason the ShMS is not
restarted within 60 seconds of it crashing, the watchdog will expire and reboot the SCM. A user can check if
a ShMS crash did occur by checking the contents of the /var/lib/shmgr/diagnostics folder after the SCM has
come back up after the reset. If files in that folder have a timestamp matching with when the SCM did reset,
then a ShMS crash is confirmed. In that case, tar up the /var/lib/shmgr/diagnostics folder and send it to
Radisys Technical Support for investigation.
HPI Application cannot load HPI
client library (HCL)
Ensure the library was built using the autobuild package provided for the HCL in the host machine where the
application is intended to be run. Also, install the HCL in the common library location of the host machine so
the HPI application can find it during runtime.
Sessions cannot be opened on the
HPI server.
Error codes:
SA_ERR_HPI_NO_RESPONSE
or
SA_ERR_HPI_INVALID_DOMAIN
Ensure the HPI Service is running on the SCM. Since the HPI server is integrated with the shelf
management server, the same troubleshooting methods described in General issues on page 121 can be
used here.
If the HPI application is being run from a remote location, ensure the HPI-specific environment variables are
set up correctly and the correct DomainId is being used.
For opening sessions on the HPI service running on the active Shelf Manager, set the environment variable
SAHPI_UNSPECIFIED_DOMAIN_ID to the common Shelf Manager IP address of the shelf where HPI
management access is desired. The domain identifier SAHPI_UNSPECIFIED_DOMAIN_ID can then be
used to open sessions.
See Changing Shelf Settings Using HPI on page 53 for details.
Ensure the network interfaces and routing tables of the host machine running the HPI application are
configured correctly to communicate with the Shelf Manager over the network.
Sessions cannot be opened on the
HPI server.
Error code: SA_ERR_HPI_BUSY
If this error is received following the shelf power-on, the HPI server is currently initializing and still going
through the discovery process.
During the shelf power-on reset, it might take some time (2–3 minutes) to discover all the entities present in
the chassis, especially if the chassis is fully loaded. Wait for one minute and retry the session open request.
Table 15. Symptom/response for Shelf Manager problems (continued)
Symptom Response