Specifications

6
Troubleshooting
113
Aftermakinganyshmgr.conffileupdate,remembertosynchronizeittothestandbySCM
fromtheplatformmanagementCLIbeforerebootingtheSCMs.Theprocedurefromthe
activeSCMis:
mcli
platformmgmt
shelfmgmt
configuresync_config
Responding to a ResourceFailed alarm
TheeventSAHPI_RESE_RESOURCE_FAILUREisreceivedbothwhenaFRUisextractedtoo
quickly(thatiswithoutthehotswapLEDpoweredon)andwhenthereisanIPMCfailure.
Becausethereisno“hardwarepresencedetect”methodforboardsinATCAshelvesitis
difficultfortheShelfManagertodistinguishbetweenasurpriseextractionandarealIPMC
failure.OnedifferencebetweenasurpriseextractionandanIPMCfailurecaseisinhowthe
'recovery'eventisreceivedforthetwocases,asfollows:
1. ForasurpriseextractioninducedRESOURCE_FAILEDstatethereisno
RESOURCE_RESTOREDevent.WhenanewFRUisinsertedinthesameslot,youdirectly
receivealastknowngoodhotswapstateof“NOT_PRESENT”(e.g.ACTIVE
>NOT_PRESENT)eventforthefailedresourcewiththeevent'sseveritysettothe
resource'sseverity(SAHPI_MAJORforFrontBoards).Thisisfollowedwiththe
NOT_PRESENT>INACTIVEeventfortheresourceforthenewlyinsertedFRUinthatslot.
2. ForanIPMCfailureinducedRESOURCE_FAILEDstateyoureceiveaRESOURCE_RESTORED
eventwhentheIPMCrecoversfromthefailure.Therearenohotswapeventsreceived
here.IPMCfailureinducedresourcefailuresarehighlyunlikely.RadisysIPMCsusea
WatchdogrunbytheIPMCFPGA.TheIPMCstrobestheFPGAmaintainedWatchdog
periodically.IftheIPMCfails,thatwatchdogexpiresandtheFPGAresetstheIPMCto
recoveritfromitsfailure.IPMCsfrommostvendorsemploysimilarlogictoprotect
againstunrecoverableIPMCfailures.ChancesoftheIPMCfailingcontinuouslyformore
than
5seconds(thetimeittakesfortheShMStopingtheIPMCtwiceforcommunication
test)isverylow.OfcourseanIPMBfailurecancauseaRESOURCE_FAILEDconditionthat
canrecoverlateriftheIPMCcanrecoveritselffromtheIPMBerror.
ThenexteventfromaResourceFailedevent
canprovidesomeinformationonwhetherthe
FRUwassurpriseextractedorsimplylostcommunicationforatimewhilestillbeingpresent
inthatslot.
YoucanchoosetosimplymarkaFRUasfailedwhenyougettheResourceFailedeventand
failoveranycriticalservicerunningonthatFRUtoanotherFRU.Markingitasextractedmight
provideincorrectinformationtoanoperatorincasetheFRUis notactuallyextractedfromthe
shelf.FromamanagementperspectivetheFRUcanbetreatedlikeitisINACTIVEsono
servicesareassignedtoit.Thenwhenthenexteventcomes
(eitheraNOT_PRESENThotswap
eventoraRESOURCE_RESTOREDevent)youcantakeappropriateactionbymarkingtheFRU
aseitherextracted(forNOT_PRESENTevent)orreactivated(RESOURCE_RESTORED)event.