Specifications
6
Troubleshooting
113
Aftermakinganyshmgr.conffileupdate,remembertosynchronizeittothestandbySCM
fromtheplatform‐managementCLIbeforerebootingtheSCMs.Theprocedurefromthe
activeSCMis:
mcli
platform‐mgmt
shelf‐mgmt
configuresync_config
Responding to a ResourceFailed alarm
TheeventSAHPI_RESE_RESOURCE_FAILUREisreceivedbothwhenaFRUisextractedtoo
quickly(thatiswithoutthehot‐swapLEDpoweredon)andwhenthereisanIPMCfailure.
Becausethereisno“hardwarepresencedetect”methodforboardsinATCAshelvesitis
difficultfortheShelfManagertodistinguishbetweenasurpriseextractionandarealIPMC
failure.OnedifferencebetweenasurpriseextractionandanIPMCfailurecaseisinhowthe
'recovery'eventisreceivedforthetwocases,asfollows:
1. Forasurpriseextraction‐inducedRESOURCE_FAILEDstatethereisno
RESOURCE_RESTOREDevent.WhenanewFRUisinsertedinthesameslot,youdirectly
receivealastknowngoodhot‐swapstateof“NOT_PRESENT”(e.g.ACTIVE‐
>NOT_PRESENT)eventforthefailedresourcewiththeevent'sseveritysettothe
resource'sseverity(SAHPI_MAJORforFrontBoards).Thisisfollowedwiththe
NOT_PRESENT‐>INACTIVEeventfortheresourceforthenewlyinsertedFRUinthatslot.
2. ForanIPMCfailure‐inducedRESOURCE_FAILEDstateyoureceiveaRESOURCE_RESTORED
eventwhentheIPMCrecoversfromthefailure.Therearenohot‐swapeventsreceived
here.IPMCfailureinducedresourcefailuresarehighlyunlikely.RadisysIPMCsusea
WatchdogrunbytheIPMCFPGA.TheIPMCstrobestheFPGA‐maintainedWatchdog
periodically.IftheIPMCfails,thatwatchdogexpiresandtheFPGAresetstheIPMCto
recoveritfromitsfailure.IPMCsfrommostvendorsemploysimilarlogictoprotect
againstunrecoverableIPMCfailures.ChancesoftheIPMCfailingcontinuouslyformore
than
5seconds(thetimeittakesfortheShMStopingtheIPMCtwiceforcommunication
test)isverylow.OfcourseanIPMBfailurecancauseaRESOURCE_FAILEDconditionthat
canrecoverlateriftheIPMCcanrecoveritselffromtheIPMBerror.
ThenexteventfromaResourceFailedevent
canprovidesomeinformationonwhetherthe
FRUwassurpriseextractedorsimplylostcommunicationforatimewhilestillbeingpresent
inthatslot.
YoucanchoosetosimplymarkaFRUasfailedwhenyougettheResourceFailedeventand
failoveranycriticalservicerunningonthatFRUtoanotherFRU.Markingitasextractedmight
provideincorrectinformationtoanoperatorincasetheFRUis notactuallyextractedfromthe
shelf.FromamanagementperspectivetheFRUcanbetreatedlikeitisINACTIVEsono
servicesareassignedtoit.Thenwhenthenexteventcomes
(eitheraNOT_PRESENThot‐swap
eventoraRESOURCE_RESTOREDevent)youcantakeappropriateactionbymarkingtheFRU
aseitherextracted(forNOT_PRESENTevent)orreactivated(RESOURCE_RESTORED)event.