Specifications

2
Software Architecture
27
Peer communication loss causes failover
IfcommunicationbetweenthepeerShMCsbreaksdown,thefirstoftheprevious
communicationmechanisms(ShMCtoShMCoverIPMB)fails.ThestandbyShelfManager
attemptstopingitspeerovertheLANinterfacetoconfirmitsfullhighavailability(HA)state.
Ifthisfails,thiscausesthestandbyShelfManagertoeffectafailover,assumetheactiverole,
andsendoutaneventnotifyingalleventreceiversofthefailover.Thiscanhappenifthe
activeShelfManagerwashotswappedfromtheshelfimproperlyorifthelocalIPMBonthe
activeShelfManagerhardwarefailed,causingittodelinkitselffromIPMB0.
Watchdog expiration causes failover
IftheoperatingenvironmentthattheShMSisrunningonfails,theIPMIwatchdoginthe
ShMCexpires,causingittoresetthemodulehostingtheShMSsothatitrebootstoagood
state.IfthishappensintheactiveShelfManager,afailovertothestandbyShelfManager
occursandthepreviouslyactiveShelfManagerassumesthestandbyroleafteritreboots.
However,ifthishappensinthestandbyShelfManager,thennofailoverisrequired,buta
payloadresettoagoodworkingstandbymodestillhappens.
Peer communication loss causes failover, non-redundancy
IfthepeerShMSsarenotabletomaintaincommunicationovertheLANinterface,they
attempttosynchronizeovertheIPMB.Ifthatfailsaswell,thestandbyShMSassumesthe
activerole.ThisprotectsagainstanunlikelyconditionwheretheactiveShMShasan
unrecoverablefailure,butkeepsrunninginanonfunctionalstate.Withcommunicationover
theLANinterfacelost,theShelfManagersarenolongerconsideredredundant,andtheShelf
Managerredundancysensoronpage 28issuesanalarm.
Watchdog expiration causes reboot of non-redundant SCM
IfnostandbyShelfManagerispresentandthecommunicationbetweentheactiveShMSand
ShMCbreaksdown,theShMCwatchdogstillexpires,causingasoftwarerebootoftheSCM.In
thiscase,theShelfManagerstillrebootstoagoodworkingstateandreassumestheactive
role.Therebootresults
indowntime,becausethereisnoShMStomonitorandrespondto
eventsfromshelfcomponentswhiletheShelfManagerpayloadisrebooting.
Initiating a failover manually
Youcanmanuallyinitiateafailoverby:
•UsingtheplatformmanagementCLI.
•UsingHPI‐Controlnumber0x1010inresource0x02.RefertotheShelfManagerfailover
controlsectionoftheSAFMappingSpecificationformoreinformation.
Tog gling(openingandthenclosing)thebottomejectorlatchoftheactiveSCM.
•Poweringofforpower
cyclingtheactiveSCMusingHPIwiththe
saHpiResourcePowerStateSet()function.
•ExtractingtheactiveSCMeithermanuallyorusingHPIwiththe
saHpiHotSwapActionRequest()function.