High Availability Failover Training Session for the HP e3000 MPE/iX 7.0 & 7.5 Sept, 2003 Audio speaker notes. Just click this speaker icon to hear this slides audio.
Agenda • Important HAFO doc changes • Problem description • Software/hardware caveats • When or where HAFO is/isn’t a good fit • Configuration setup • HAFO commands • HAFO events • Recovery and troubleshooting • Q&A 2
Important HAFO Documentation Changes • http://jazz.external.hp.com/mpeha/hafo/32650-90911.pdf http://www.docs.hp.com/mpeix/pdf/32650-90911.
Agenda • Important HAFO doc changes • Problem description • Software/hardware caveats • When or where HAFO is/isn’t a good fit • Configuration setup • HAFO commands • HAFO events • Recovery and troubleshooting • Q&A 4
Problem Description hp e3000 high availability failover with dual active paths SCSI Card and Cable DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DAT DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DAT XP or VA7x10 Controller 5
Problem Description hp e3000 high availability failover with dual active paths SCSI Card and Cable DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DAT DAT DATA DATA DATA DATA DATA DATA TA A D T DA A TA DA TA DA XP or VA7x10 Controller 6
Problem Description SCSI HAFO ~ No Response Errors Analog Anomalies 7
Problem Description FC HAFO + No Response Errors Analog Anomalies 8
Agenda • Important HAFO doc changes • Problem description • Software/hardware caveats • When or where HAFO is/isn’t a good fit • Configuration setup • HAFO commands • HAFO events • Recovery and troubleshooting • Q&A 9
Software/Hardware Caveats • HAFO and Cluster/iX are not supported (together) • All Ldevs (luns) on a bus must be configured as HAFO protected • All HAFO protected Ldevs must use similar connection strategy (switches and paths) • HAFO is not a fault-tolerant solution (unplanned outages are converted to planned outages to fully recover) • HAFO is dependent on performance expectations (false failovers) • Plan plan plan….
Agenda • Important HAFO doc changes • Problem description • Software/hardware caveats • When or where HAFO is/isn’t a good fit • Configuration setup • HAFO commands • HAFO events • Recovery and troubleshooting • Q&A 11
HAFO? Is it for you? • HAFO is not the magic panacea of HA • HAFO adds complexity to the operation of a system • Unless there is a good understanding of the I/O characteristics of the system, you will introduce false failovers and reduce the I/O throughput of MPE 12
Agenda • Important HAFO doc changes • Problem description • Software/hardware caveats • When or where HAFO is/isn’t a good fit • Configuration setup • HAFO commands • HAFO events • Recovery and troubleshooting • Q&A 13
Configure the storage array Create the Luns on the XP so that they are visible to MPE from both ports Use Mapper or FCSCAN –h to verify Lun connection MPE/iX High Availability Array A r r a y Ldev 1 Ldev 30 File System SCSI FC bus 1 30 C o n t 31 Ldev 31 Volume Mgmt A r r a y Ldev 101 Ldev 102 SCSI FC bus C o n t 101 102 103 Ldev 103 14
Assigning Luns to Ports Port address assigned by the path taken Example: Primary Port ID is 36 Secondary Port ID is 28 To generate a path to the Ldev 1 take Port + Lun addr = 36.0 An alternate path to Ldev 1 is 28.0 Lun Addrs Port.0 1 Port.1 30 Port.2 Port.3 31 101 Port.4 Port.
Configuring Ldevs Using Sysgen Create a configuration for Ldev 1-31 on path or port 36. And configure Ldev 101-103 on path or port 28. 1 Example: io>ad io>ad io>ad io>ad io>ad io>ad path=0/0.36.0 path=0/0.36.1 path=0/0.36.2 path=0/0.28.3 path=0/0.28.4 path=0/0.28.
Protecting Ldevs with HAFO Using Sysgen Enter the HA menu and issue: ha>ad ha>ad ha>ad ha>ad ha>ad ha>ad 1 30 31 101 102 103 0/0.36.0 0/0.36.1 0/0.36.2 0/0.28.3 0/0.28.4 0/0.28.5 0/0.28 0/0.28 0/0.28 0/0.36 0/0.36 0/0.
Creating User Volumes or Adding members to the system vol set Using Volutil: User Volumes • Use the Newset and Newvol commands to create you user volume set. • Then VSCLOSE and VSOPEN the volume set before using. Adding to the System Volume Set 1 30 Path 36 31 • Use the Newvol command.
Agenda • Important HAFO doc changes • Problem description • Software/hardware caveats • When or where HAFO is/isn’t a good fit • Configuration setup • HAFO commands • HAFO events • Recovery and troubleshooting • Q&A 19
Important HAFO Commands ADDCONF addconf (ad) Example: ha>ad 450 0/6/2/1.3.3 0/6/2/0 True New Feature Timeout parm defaults to true. This parm allows for storage and server configurations where very poor I/O performance has been identified as the cause of false failovers.
Important HAFO Commands LISTCONF ha>LISTCONF Ldev Primary Path Alternate Path Timeout ===== ==================== ================== ======= 350 0/4/0/0.70954.23 0/6/0/0.73289 True 351 0/4/0/0.70954.24 0/6/0/0.73289 True 352 0/6/0/0.73289.25 0/4/0/0.70954 False 353 0/6/0/0.73289.26 0/4/0/0.70954 False 450 0/6/2/1.3.3 0/6/2/0 True 451 0/6/2/1.3.4 0/6/2/0 True 452 0/6/2/0.3.5 0/6/2/1 False 453 0/6/2/0.3.
Important HAFO Commands DOHA ha> doha Start of validation for all HAFO configured devices. ===================================================== VALIDATING ** Ldev: 50 Pri path: 8.15.0 Alt path: 48 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ldev 50 configuration Validated Successfully VALIDATING ** Ldev: 51 Pri path: 8.15.1 Alt path: 48 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ldev 51 configuration Validated Successfully End of validation for all HAFO configured devices.
Important HAFO Commands GONEXT ha> Go After the problem has been repaired, issue the GoNext command to put the Ldevs back on their primary paths. This command should not be used to causes the Ldev to back to a known bad path. To do so may result in a reboot to reinitialize the MPE I/O configruation.
Agenda • Important HAFO doc changes • Problem description • Software/hardware caveats • When or where HAFO is/isn’t a good fit • Configuration setup • HAFO commands • HAFO events • Recovery and troubleshooting • Q&A 24
HAFO Event HIGH AVAILABILITY FAILOVER IS STARTED FOR Ldev# IN DISK ARRAY. NO DATA LOSS OR CORRUPTION. SYSTEM OPERATION WILL CONTINUE.PLEASE PLACE SERVICE CALL SOON. ACKNOWLEDGE HAFO FAILOVER IN DISK ARRAY FOR Ldev# (Y/N)? Reply to the message :HASTAT High Availability Failover Device Status Ldev Primary Path Alternate Path Pri. Status Alt. Status ===== ==================== ================== =============== =============== 350 351 352 353 450 451 452 453 0/4/0/0.70954.23 0/4/0/0.70954.24 0/6/0/0.73289.
HAFO Event • Ldev and 351 have HIGH AVAILABILITY FAILOVER IS STARTED FOR Ldev# IN350 DISK ARRAY. NO DATA LOSS encountered an array OR CORRUPTION. SYSTEM OPERATION WILL CONTINUE.PLEASE PLACE SERVICEerror CALLand SOON. have switched ACKNOWLEDGE HAFO FAILOVER IN DISK ARRAY FOR Ldev# (Y/N)? Reply to the message over successfully. • Ldev 451 switched over because of an I/O timeout. :HASTAT High Availability Failover Device Status Ldev Primary Path Alternate Path Pri. Status Alt.
Recovering from a HAFO Event Array Failure Error • This is a failure in the path of the Ldev and could be either the HBA or array controller or any component in between. Diagnose this problem as you would any other hardware component by collecting system and diagnostic log information. • Ldev 350 and 351 have encountered an array error and have switched over successfully. • Ldev 451 switched over because of an I/O timeout. Only after repairing the part should you use the GoNext command.
Recovering from a HAFO Event Timeout Failover • Ldev 350 and 351 have encountered an array error and have switched over successfully. • Treat this as if it was a hardware • Ldev 451 switched over because of failure. Collect system and diagnostic an I/O timeout. log information. This information along with performance data is needed to prove that the I/O timeout is due only to the fact that the storage array can’t keep up with the I/O load of MPE and is not another cause masquerading as Timeout.
Slide Presentation Conclusion Sorry, but questions and answers from last live presentation was for internal use only 29