Specifications
6
Troubleshooting
129
Sessions cannot be opened on the
HPI server.
Error code:
SA_ERR_HPI_OUT_OF_SPACE
The HPI server has reached the limit (32) for the number of sessions that can be opened simultaneously.
Close a few sessions and try again, or use any of the already opened sessions for the necessary operations.
Previous sessions may have been closed without issuing the saHpiSessionClose() function, which leaves the
sessions in an orphaned state. Orphaned sessions are closed after 60 seconds, so wait 60 seconds and try
again.
Sessions cannot be opened on the
HPI server due to a response
timeout.
Error code:
SA_ERR_HPI_NO_RESPONSE
An SA_ERR_HPI_NO_RESPONSE during a routine HPI operation usually maps to the IPMI 0xC3 (request
timed out) response code. This is usually not expected but sometimes if the IPMC is busy responding to other
requests it is possible that the response to a request can get delayed beyond the 250msec IPMI request
timeout period which would result in the request timing out on the Shelf Manager side and a
SA_ERR_HPI_NO_RESPONSE return code would be sent for the corresponding HPI request.
The HSD retries each IPMI command 3 times so all 3 tries have to time out before the
SA_ERR_HPI_NO_RESPONSE return code is sent back. In live operation this is very unlikely but can happen
in specific situations.
Following a Power-on operation initiation it is likely that the IPMC can get very busy because the ShMS will be
reading SDRs and FRU data from it. Additionally Power-Budgeting and E-Keying can also occur while SDRs
are being read. If IPMI requests to retrieve power-state (done by reading HS sensor) are sent during this time
it is possible that the response can get delayed a bit because the IPMC is simultaneously handling a lot of
other requests. Now one of the 3 tries would more often that not succeed but perhaps once in 1000 tries there
can be a situation where all 3 retries are delayed and This results in an SA_ERR_HPI_NO_RESPONSE return
code. Reissuing that HPI request should succeed.
A hot-swap cannot be completed
because of a communications
timeout.
Error indication: The hot-swap LED
does not start or complete its
deactivation or activation
sequence.
The M state of the ShMC/IPMC has no bearing on message timeout and retrial. The ShMC and other IPMCs
all share the same messaging infrastructure.
All requests originating from the ShMC/IPMC are retried three times at 250 msec interval each. If all three
retries get no response then a timeout response code (0xC3) is sent back to the requester. The requester
might just be an internal module of the IPMC (e.g. hot-swap state machine that generates hot-swap events), or
it might be an application running on the LMP that sent the request through the ShMC (e.g. Shelf Manager).
The requester might decide to retry the request again in which case the ShMC will do the same three retrials
for the next request. In the case of hot-swap events, the IPMC generating the event will automatically keep
retrying the event until it gets a success return code from the Shelf Manager. This guarantees that the Shelf
Manager always receives hot-swap events from all IPMCs in the shelf, but it might result in a hot-swap
deactivation/activation process hang due to no success return code.
RMCP sessions cannot be opened
on the Shelf Manager.
• Verify connectivity to the IP address by connecting with Telnet.
• RMCP sessions can only be opened on the active Shelf Manager. If using the IP address of an SCM, verify
that it is hosting the active Shelf Manager. You can use the Shelf Manager IP address instead to ensure
you reach the active Shelf Manager.
• The RMCP server may have reached the limit (32) for the number of sessions that can be opened
simultaneously. Close a few sessions and try again. Or if possible, use any of the already opened sessions
for the necessary operations.
Resource presence table (RPT)
has only one resource entry
present — ResourceId 0x1
This can happen if the Shelf Manager was not able to find a valid shelf FRU device and was thus not able to
initialize the shelf. The RPT will then have only one resource associated with the shelf. Sensor 0x1000 (shelf
FRU info valid sensor) in the ATCA shelf resource will generate an alarm to inform the user that the shelf FRU
information is corrupt and needs to be reset.
Following a reset of the shelf FRU information, power cycling the module hosting the active Shelf Manager
should allow the Shelf Manager to fully initialize the shelf and thereby allow full resource discovery. If the
problem is not resolved by a module reboot, power cycle the shelf.
Table 15. Symptom/response for Shelf Manager problems (continued)
Symptom Response