Specifications

ManualsBrandsRadiSys ManualsComputer equipmentATCA-4616

121

122

123

124

125

126

127

128

129

130

Troubleshooting

129

Sessions cannot be opened on the

HPI server.

Error code:

SA_ERR_HPI_OUT_OF_SPACE

The HPI server has reached the limit (32) for the number of sessions that can be opened simultaneously.

Close a few sessions and try again, or use any of the already opened sessions for the necessary operations.

Previous sessions may have been closed without issuing the saHpiSessionClose() function, which leaves the

sessions in an orphaned state. Orphaned sessions are closed after 60 seconds, so wait 60 seconds and try

again.

Sessions cannot be opened on the

HPI server due to a response

timeout.

Error code:

SA_ERR_HPI_NO_RESPONSE

An SA_ERR_HPI_NO_RESPONSE during a routine HPI operation usually maps to the IPMI 0xC3 (request

timed out) response code. This is usually not expected but sometimes if the IPMC is busy responding to other

requests it is possible that the response to a request can get delayed beyond the 250msec IPMI request

timeout period which would result in the request timing out on the Shelf Manager side and a

SA_ERR_HPI_NO_RESPONSE return code would be sent for the corresponding HPI request.

The HSD retries each IPMI command 3 times so all 3 tries have to time out before the

SA_ERR_HPI_NO_RESPONSE return code is sent back. In live operation this is very unlikely but can happen

in specific situations.

Following a Power-on operation initiation it is likely that the IPMC can get very busy because the ShMS will be

reading SDRs and FRU data from it. Additionally Power-Budgeting and E-Keying can also occur while SDRs

are being read. If IPMI requests to retrieve power-state (done by reading HS sensor) are sent during this time

it is possible that the response can get delayed a bit because the IPMC is simultaneously handling a lot of

other requests. Now one of the 3 tries would more often that not succeed but perhaps once in 1000 tries there

can be a situation where all 3 retries are delayed and This results in an SA_ERR_HPI_NO_RESPONSE return

code. Reissuing that HPI request should succeed.

A hot-swap cannot be completed

because of a communications

timeout.

Error indication: The hot-swap LED

does not start or complete its

deactivation or activation

sequence.

The M state of the ShMC/IPMC has no bearing on message timeout and retrial. The ShMC and other IPMCs

all share the same messaging infrastructure.

All requests originating from the ShMC/IPMC are retried three times at 250 msec interval each. If all three

retries get no response then a timeout response code (0xC3) is sent back to the requester. The requester

might just be an internal module of the IPMC (e.g. hot-swap state machine that generates hot-swap events), or

it might be an application running on the LMP that sent the request through the ShMC (e.g. Shelf Manager).

The requester might decide to retry the request again in which case the ShMC will do the same three retrials

for the next request. In the case of hot-swap events, the IPMC generating the event will automatically keep

retrying the event until it gets a success return code from the Shelf Manager. This guarantees that the Shelf

Manager always receives hot-swap events from all IPMCs in the shelf, but it might result in a hot-swap

deactivation/activation process hang due to no success return code.

RMCP sessions cannot be opened

on the Shelf Manager.

• Verify connectivity to the IP address by connecting with Telnet.

• RMCP sessions can only be opened on the active Shelf Manager. If using the IP address of an SCM, verify

that it is hosting the active Shelf Manager. You can use the Shelf Manager IP address instead to ensure

you reach the active Shelf Manager.

• The RMCP server may have reached the limit (32) for the number of sessions that can be opened

simultaneously. Close a few sessions and try again. Or if possible, use any of the already opened sessions

for the necessary operations.

Resource presence table (RPT)

has only one resource entry

present — ResourceId 0x1

This can happen if the Shelf Manager was not able to find a valid shelf FRU device and was thus not able to

initialize the shelf. The RPT will then have only one resource associated with the shelf. Sensor 0x1000 (shelf

FRU info valid sensor) in the ATCA shelf resource will generate an alarm to inform the user that the shelf FRU

information is corrupt and needs to be reset.

Following a reset of the shelf FRU information, power cycling the module hosting the active Shelf Manager

should allow the Shelf Manager to fully initialize the shelf and thereby allow full resource discovery. If the

problem is not resolved by a module reboot, power cycle the shelf.

Table 15. Symptom/response for Shelf Manager problems (continued)

Symptom Response