HP StorageWorks Continuous Access EVA administrator guide T368796043 Part number: T3687–96043 Fourth edition: December 2005
Legal and notice information © Copyright 2003, 2005 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Contents About this guide . . . . . . . . . . . . . . . . . . . . . . . . . . Intended audience . . . . . . . . Prerequisites . . . . . . . . . . Related documentation . . . . . . Document conventions and symbols HP technical support . . . . . . . Subscription service . . . . . . . HP web sites . . . . . . . . . . Providing feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Synchronize management servers and arrays . . . . . . . . . . . . . . . . . . . . . . . . . Test failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Failover and recovery . . . . . . . . . . . . . . . . . . . . . . Planning for a disaster . . . . . . . . . . . . . . . . . Failover and recovery procedures . . . . . . . . . . . . Performing failover and recovery . . . . . . . . . . . Choosing a failover procedure . . . . . . . . . . . . Planned failover . . . . . . .
6 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . Synchronizing replication manager log files . Troubleshooting the environment . . . . . . Troubleshooting the SAN . . . . . . . . . Troubleshooting arrays . . . . . . . . . . When a destination array is offline . . . . . LUN inaccessible to host . . . . . . . . . Remote server cannot detect a destination LUN DR groups in unknown state . . . . . . . . Tunnel thrash . . . . . . . . . . . . . . Long delays or time-outs on HP–UX . . . . . . .
Figures 1 DR group replication . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Replicating relationships among DR groups . . . . . . . . . . . . . . . . . 3 Cabling EVAs with 2–port controllers and active-active failover support . . . . . 4 Cabling EVAs with 4–port controllers . . . . . . . . . . . . . . . . . . . 5 Cabling EVAs with controller software VCS 3.x . . . . . . . . . . . . . . . 6 Remote replication fabrics with redundant servers . . . . . . . . . . . . . .
Tables 1 2 3 4 5 6 Document conventions . . . . . . . . Replication products and interfaces . . When to fail over a DR group, managed Array log . . . . . . . . . . . . . Replication manager display icons . . Manual configuration form . . . . . . . . . . . . set, or . . . . . . . . . . . . . . . array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
About this guide This guide defines concepts and describes the setup and use of HP StorageWorks Continuous Access EVA for disaster recovery. For the latest information about this product, see HP StorageWorks EVA replication software release notes on the HP Continuous Access EVA web site: http://h18006.www1.hp.com/products/storage/ software/conaccesseva/index.html.
Document conventions and symbols Table 1 Document conventions Convention Element Blue text: Table 1 Cross-reference links and e-mail addresses Blue, underlined text: http://www.hp.
For continuous quality improvement, calls may be recorded or monitored. Subscription service HP strongly recommends that customers register online using the Subscriber's choice web site: http://www.hp.com/go/e-updates. Subscribing to this service provides you with e-mail updates on the latest product enhancements, newest driver versions, and firmware documentation updates as well as instant access to numerous other product resources.
About this guide
1 About HP StorageWorks Continuous Access EVA HP StorageWorks Continuous Access EVA is the remote replication component of HP StorageWorks Enterprise Virtual Array (EVA) controller software. When this component is licensed and configured, the controller copies data online, in real time, to a remote array over a storage area network (SAN). Properly configured, HP Continuous Access EVA is a disaster-tolerant storage solution that ensures data integrity if an array or site fails.
Bidirectional replication When an array contains both source virtual disks and destination virtual disks, it is bidirectional. An array can have a bidirectional data replication relationship with up to two other arrays. Individual virtual disks can have only unidirectional relationships with one other virtual disk.
1 7 2 3 4 5 4 9 8 2 4 1 6 CXO7989b 1. Host server 2. Switch 3. Host I/O 4. Replication writes 5. Local array 6. Remote array 7. Source virtual disk 8. Destination virtual disk 9. Source and destination DR groups Figure 1 DR group replication DR group log The DR group log is a designated virtual disk that stores a source DR group's host writes while replication to the destination DR group is stopped. This process is called logging.
DR group log states A DR group log can be in one of the following states: • Unused (Normal)–No source virtual disk is logging or merging. • Logging–At least one source virtual disk is writing to the DR group log but none are merging. • Merging–At least one source virtual disk is merging and logging. DR group log size When created, a log disk contains 139 MB of Vraid1 space. The log disk grows as needed when the DR group is logging.
NOTE: Failover can take other forms in the EVA environment: • Controller failover is the process that takes place when one controller in a pair assumes the workload of a failed or redirected companion controller in the same cabinet. • Fabric or path failover is the act of transferring I/O operations from one fabric or path to another. This guide describes the failover of DR groups and managed sets.
1 2 3 3 5 4 6 7 Normal replicating relationships 8 9 6 Behavior after loss of active site 7 CXO7993c 1. Source array before failover 2. Destination array before failover 3. Replication 4. Destination array 5. Source array 6. Local site 7. Remote site 8. Failover 9. Logging Figure 2 Replicating relationships among DR groups Failsafe mode Failsafe mode specifies how host writes and remote replication behave when a group member fails.
• Failsafe enabled–If any virtual disk within the DR group fails or becomes unreachable, host I/O and remote replication automatically stop for all DR group members. This preserves the order of the replicated data. A failsafe-enabled DR group can be in one of two states: • Locked (failsafe-locked)–Host I/O and remote replication automatically stop. • Unlocked (failsafe-unlocked)–Host I/O and remote replication occur.
database resides on that array. Instructions for its use can be found in the HP StorageWorks Command View EVA Online Help. To use HP Continuous Access EVA you must first use HP Command View EVA to: • Add licenses • Initialize controllers–This process binds the controllers together as an operational pair and establishes the first disk group on the disk array. • Create disk groups–A disk group is a set of physical disks from which storage pools are created.
2 Remote replication setup Local and remote sites can be as close as the same room or thousands of miles apart. Performance-optimized implementations may include three or four sites in multiple replication relationships. For supported distances and multiple replication relationships, see HP StorageWorks Continuous Access EVA planning guide. For distance technologies and fabric rules, see Volumes 2 and 4 of HP StorageWorks SAN design reference guide on the HP SAN Infrastructure web site: http://h18006.ww1.
25046a Figure 3 Cabling EVAs with 2–port controllers and active-active failover support Figure 4 shows the supported cabling scheme for remote replication on EVA models with 4–port controllers. On these models, each controller has redundant connections to both fabrics. Even-numbered ports are connected to one fabric and odd-numbered ports are connected to the other fabric.
CXO8092c Figure 5 Cabling EVAs with controller software VCS 3.x Install replication licenses When you purchase HP Continuous Access EVA, you receive a replication license for each array (local and remote) in a remote replication relationship. Replication licenses are based on the amount (in TB) of replicated data on each array. For license offerings, see the product Quickspecs on the HP Continuous Access EVA web site.
10 2 9 1 8 12 4 3 11 13 5 6 7 14 16 15 CXO8165c 1. Local active management server 2. Local host 3. Local controller 1 4. Local black fabric switch 5. Local controller 2 6. Local gray fabric switch 7. Local standby management server (optional) 9. Interswitch link—black fabric 9. Remote standby management server 10. Remote host 11. Remote controller 1 12. Remote black fabric switch 13. Remote controller 2 14. Remote gray fabric switch 15.
1 2 3 10 5 7 4 6 8 9 CXO8173B 1. Local management server 2. Remote management server 3. Local array 1 4. Local array 2 5. Remote array 1 6. Remote array 2 7. Management zone A 8. Management zone B 9. Remote replication management zone 10. Fabric Figure 7 Local and remote management server zones For instructions to create zones, see your switch user guide. Also follow the best practices in Volume 2 of HP StorageWorks SAN design reference guide.
• Hosts without multipathing software • Hosts with operating systems not supported by HP Continuous Access EVA (for supported operating systems, see HP StorageWorks EVA software compatibility reference) • Incompatible storage system products (for compatible storage systems, see HP StorageWorks EVA software compatibility reference); for example, HSG controllers with Data Replication Manager Remote replication zones can include compatible offline tape devices.
• Calculate the disk group occupancy alarm setting according to the EVA best practice, being sure to include the total maximum capacity for all DR group logs. (See HP StorageWorks Enterprise Virtual Array configuration best practices white paper.) Add hosts Adding a host defines a path between the host HBAs and the arrays in the management zone. Using HP Command View EVA, add each host that needs access to the local (source) or remote (destination) arrays.
Using HP Replication Solutions Manager or HP Command View EVA, create DR groups on the local (source) array. At a minimum, you must specify a source virtual disk and a destination array. The array software creates a corresponding DR group and virtual disk on the remote array. Specifying virtual disks Select one virtual disk to create the source DR group.
Set up remote and standby management servers Use the backup from the local management server to duplicate the replication configuration on remote and standby management servers. Before importing the configuration from the local replication manager, you must assume active management of the arrays in the configuration using HP Command View EVA on the remote or standby management server. For the procedure to acquire control of the arrays, see HP Command View EVA user guide.
Remote replication setup
3 Failover and recovery This chapter provides information for failing over and resuming operations after a planned or unplanned loss of operation. Several scenarios cover most situations you could encounter, with procedures for handling each scenario. Planning for a disaster Planning helps minimize the downtime brought on by a disaster. Include the following in your disaster recovery planning: • Operate with a supported disaster-tolerant configuration.
Always verify that all components of the remote array are operational before you fail over. NOTE: HP recommends that you not fail over any DR group more frequently than once every 15 minutes. Performing failover and recovery Failover and recovery procedures include such actions as failover, suspend, resume, disable failsafe, mounting, and unmounting.
Table 3 When to fail over a DR group, managed set, or array Failure situation Recommended action DR group in normal mode DR group in failsafe mode Maintenance requiring loss of access to source array Perform Planned failover on destination array. Total loss of source array Loss of both source controllers Manually intervene to fail over data on destination array, and then restart processing at the destination array. Perform anUnplanned failover).
Loss of source storage system Unplanned Planned/ Unplanned Planned If possible: Stop host I/O If applies: Activate another server Verify normalization complete, then stop host I/O If applies: Activate another server Failover No Throttle I/O or stop replication? Start host I/O Yes If applies: failsafe to normal Issue SUSPEND commands Options: - Remain failed over at destination - Fail over to preferred or home storage system - Fail over to new hardware CXO8065b Figure 8 Planned and unplanned
For all operating systems, including VMware Virtual OSs, but excluding Windows, stop all I/O to the virtual disks in source DR groups and unmount associated volumes or file systems. For VMware, also shut down each virtual machine on the VMware server, and either leave the virtual machines off until you fail back the LUNs, or, using the VM configuration editor, remove all LUN assignments in the hardware configuration files. For Windows, flush all cache files and shut down the operating system.
For Windows, flush all cache files and shut down the operating system. Small files held in Windows cache can disrupt remote replication. Reboot the host as Windows requires; HP Continuous Access EVA is neutral to rebooting the host. 2. If you cannot access the management server that is managing the arrays, establish management control with another management server. For instructions, see HP Command View EVA user guide. 3. Fail over the destination DR groups.
Loss of destination in failsafe mode Change failsafe to normal Resume host I/O If applies: throttle I/O Merge or full copy when destination available If applies: normal to failsafe CXO8066A Figure 9 Resumption of operation if unable to access destination in failsafe mode Procedure: 1. Change affected source DR groups from failsafe-enabled mode to normal mode. 2. If necessary, issue operating system commands to the local hosts to restart I/O on the virtual disks that were failsafe-locked.
Fail back to the original source Possible scenario: You are operating from an array that is not the original source (designated Home in the replication manager). You need to move operations from the remote array back to the local array. Action summary: Prepare the source array for the failover and fail over the destination (original source) DR group. Failback (also known as reverting to Home) is similar to a planned failover.
Failover to new hardware Delete all DR groups on surviving system Replace failed hardware Add hosts to system with new hardware Power up new hardware No Configuration captured with SSSU? Yes Add disk groups to system with new hardware Run SSSU script step1A on new hardware Create non-DR group virtual disk Run SSSU script step2 on surviving hardware Present non-DR group virtual disk to hosts Recreate DR groups on surviving hardware Create DR groups on surviving hardware Present hosts Failover D
Table 4 Array log Array with failed or new hardware Current source array Array Name Array Name Array Name Array Name Array Name Array Name Array Name Array Name 1. Record the names of the array with failed or new hardware (current destination) and the current source array in a table such as Table 4. For example, your array with new hardware may be named HSV01 and your current source array may be named HSV02. Refer to this table during the procedure as needed. 2.
11. Add the hosts for the system with new hardware. 12. Create the non-DR group virtual disks. 13. Present all non-DR group virtual disks to their hosts. 14. Perform one of the following: a. If the source array configuration was captured with the SSSU, executeConfigName_step2 on the source array. ConfigName is a user-assigned name given to the SSSU script at the time of creation. DR groups are re-created with the SSSU if they were performing as the source when the configuration was captured.
Table 5 Replication manager display icons Resource Symbol Description Array Indicates the array is in an abnormal state and requires attention. Virtual disks Indicates a catastrophic failure and requires immediate action. DR groups Red indicates a failure; yellow indicates the DR group is in a degraded state. Either condition requires immediate attention.
Figure 11 Disk Group Hardware Failure window 3. Click Start deletion process. After a prompt for confirmation, a list of failed DR groups is displayed. 4. One at a time, select the affected DR groups and click Delete. Deleting a DR group removes the relationship between the virtual disk members. It does not delete data from the virtual disks. 5. Select and delete the failed virtual disks that were members of the affected DR groups.
Disk group hardware failure on the destination array Possible scenario: A hardware failure on a destination array causes a disk group to become inoperative. This can be caused by the loss of enough disks to create a loss of redundancy within the disk group and affects all Vraid types present on the disk group. Action summary: Delete the DR groups on the source array that replicated to the failed disk group. Repair the failed disk group on the destination array.
7. On the surviving (source) array, delete the source DR group associated with the DR groups deleted in Step 3. 8. (Optional) Repair your hard drives and re-create your disk group on the destination array. (See the HP Command View EVA documentation). 9. Refresh the surviving (source) array and re-create the DR groups. 10. On the destination array, present the destination virtual disks.
Failover and recovery
4 Operating system specifics This chapter describes operating system procedures that accompany remote replication procedures, especially for failover and recovery. Resuming host I/O after failover Procedures for detecting disk devices and restarting I/O operations after DR group failover differ among operating systems. Host operating system procedures are provided here for your convenience. HP OpenVMS procedure to resume I/O 1. If the remote hosts are shut down, boot them now.
# vgimport Example: # vgimport /dev/vg09 /dev/dsk/c18t0d /dev/dsk/c18t1d0 /dev/dsk/c25t0d0 • You can then attempt to display this virtual group using # vgdisplay -v /dev/vg09. If this returns an error about "Volume group not activated" then you will need to activate it using the vgchange command. # vgchange -a y Example: # vgchange -a y /dev/vg09 • You may receive errors trying to mount the failed over volume (an error stating the volume is corrupt).
and activate them. However, you must manually mount each individual NSS volume by entering MOUNT VolumeName at the NetWare console. • If the remote hosts are already up and running, or if they do not recognize the drives, issue the following command from the console before mounting the volumes: SCAN FOR NEW DEVICES Alternatively, you can use the NWCONFIG utility to issue this same command.
Example: # cfgadm –c configure c3::500060e802eb2b0b # cfgadm –c configure c4::500060e802eb2b14 NOTE: The controller instance (c#) may differ between systems. • If you are using Solaris 9, run this command to update the sd driver: #update_drv –f sd • Run the devfsadm command to build the appropriate device files: #devfsadm –C • If you are using Solaris 2.
NOTE: This procedure is not supported for unplanned failovers. The term "bootless" means that after the LUNs are first presented to a destination host, which requires an initial reboot, no further reboot of that host should be required. Source host procedure Perform one of the following steps on the source host, depending on whether or not you are running LifeKeeper 4.4.3. 1. If you are running LifeKeeper 4.4.3, proceed to step 2. If you are not running LifeKeeper, perform the following steps: a.
Perform one of the following steps on the destination host, depending on whether or not you are running LifeKeeper 4.4.3. 1. If you are running LifeKeeper 4.4.3, proceed to step 2. If you are not running LifeKeeper, perform the following steps: a. Issue the following command to make the volume known to the system: vgimport VolumeGroupName PhysicalVolumePath Example: vgimport vg01 /dev/sda1 b. Mount the file systems. Example: mount –t reiserfs /dev/vg01/lvol1 /mounts/lvol1 c. Start host I/O. d.
5 Managing remote replication This chapter describes replication routine and advanced procedures. Using remote replication in a mixed EVA environment Specific remote replication features depend on the controller software version.
risk. If a major failure occurs at the local site during a full copy, the snapclone provides a clean copy of data as it existed before full copy writes started on the remote array. (Any new writes that occurred on the source between the time the snapclone was created and the major failure occurred would be lost.) As a best practice, whenever a link is expected to be down more than several minutes, create a snapclone of the destination virtual disk.
IMPORTANT: If the password for the HP Command View EVA instance on the remote or standby server is different from the password imported with the local replication manager database, all storage resources on the remote management server will be displayed in an unknown state.
Manually capturing your configuration You can capture configuration information by writing it down manually. Use the following form as a guideline for capturing the information. Record the World Wide Name (WWN) of each host HBA, array controller, and management server on local and remote sites. The WWN is a hexadecimal number on the bottom of the HBA board. Look for a small bar code label with an IEEE precursor. An example is 1000–0000–C920–A5BA. Keep a copy of the record at each site.
Table 6 Manual configuration form Array name: Array WWN: Console LUN ID: (default = 0) Disk group information Disk group name: (default = default disk group) Device count: Spare policy: (none, single, or double) Disk type: (online or nearline) Occupancy alarm: (default = 95%) Host information Folder name: (default = \Hosts\) Host name: Operating system: For each HBA port: WWN: Virtual disk information Folder name: (default = \Virtual Disks\) Virtual disk name: Disk group: Size: Redundancy le
Upgrading controller software Planning a controller software upgrade Before upgrading array controller software, ensure that all arrays in remote replication relationships are fully functional with no failed hardware. The following additional conditions must be met: • All arrays in remote replication relationships are running the same version of controller software in the VCS or XCS series.
• Data distribution–Pushing copies of data to other geographic locations to make it locally accessible. • Data migration–Moving data to a new location or to one with a larger storage capacity. To move data using a snapclone: 1. Make a snapclone of the virtual disk containing the data to be moved. See the online help for procedures on creating snapclones. After the snapclone is created, the link from the snapclone to its original virtual disk dissolves, and the snapclone becomes a separate virtual disk. 2.
1 2 2 3 4 5 7 2 6 CXO8068b 1. HSV05 array 2. HSV06 array 3. HSV18 array 4. DR group 5. Virtual disk 1 6. Virtual disk 2 7. Replication 8. Virtual disk snapclone Figure 13 Creating a DR group from a snapclone Using snapclones to cascade data replication across three sites NOTE: An HP Business Copy EVA license is required for the following procedure. This procedure allows you to move copies of your data to a second remote location using HP Command View EVA and snapclones.
group), these members are added to a DR group called DR snapclone1. This DR group now resides on a source array that replicates to the desired destination array (Site 3). At the remote location, you can remove the virtual disk members from the DR group, renamed, and archived. NOTE: For this procedure, Site 1 is called the source array, Site 2 (the destination for the DR group from Site 1) is called the intermediate array, and Site 3 is referred to as the remote array.
An Operation completed page is displayed. Procedure 1. Enable failsafe mode for any DR group containing more than one replication pair. 2. Set synchronous write mode for any DR groups in this procedure. 3. If normalization is occurring to members of the DR group to be moved, wait for the members to normalize. 4. If an application requires that I/O be suspended before creation of a snapclone, suspend I/O at this time. 5.
6 Troubleshooting This chapter provides troubleshooting guidance for arrays and links between multiple sites. Synchronizing replication manager log files The replication manager server and host agents generate log files for job events. These detailed event log files can be helpful to HP support personnel when troubleshooting replication jobs. Ensure the usefulness of these logs by synchronizing the array clocks to the management server (see Synchronize management servers and arrays).
0c1e5f0c: Severity: Critical – failure or failure imminent. The members of the specified Source Data Replication Group have not been presented to the host because the remote Storage System is not accessible. To resolve this situation, you can: • Fail over the destination DR groups and present the virtual disks from the new source to the desired hosts. With this option, any data in the DR group log on the previous source array is lost.
• Ensure that all router are configured correctly. • Contact your service provider to check if the circuit has been alternate routed. • Check to see if thrashing occurs during peak times and not during low volume times. If so, the circuit may be over subscribed and you may need to increase bandwidth. NOTE: An informational event (c22000c) is generated for an open tunnel. No action is required.
Troubleshooting
Glossary This glossary defines terms used in this guide or related to this product and is not a comprehensive glossary of computer terms. array See virtual array and storage system. asynchronous A descriptive term for computing models that eliminate timing dependencies between sequential processes. In asynchronous replication, the array controller acknowledges that data has been written at the source before the data is copied at the destination. Asynchronous replication is an optional DR group property.
HP Continuous Access EVA HP Continuous Access EVA is a storage-based HP StorageWorks product consisting of two or more arrays performing disk-to-disk replication, along with the management user interfaces that facilitates configuring, monitoring, and maintaining the replicating capabilities of the arrays. Home The DR group that is the preferred source in a replication relationship. By default, Home is the original source, but it can be set to the destination DR group.
source–destination pair A copy set. Storage Management Appliance HP OpenView Storage Management Appliance, an HP hardware–software product designed to run SAN management applications such as HP StorageWorks Command View EVA and HP StorageWorks Replication Solutions Manager. storage system Synonymous with virtual array. The HP StorageWorks Enterprise Virtual Array consists of one or more storage systems. See also virtual array.
Glossary
Index A D AIX procedure to detect devices, 48 asynchronous replication description, 13 specifying, 28 audience, 9 B backup configuration, 28 manual, 56 procedure, 54 using HP StorageWorks Replication Solutions Manager, 54 using SSSU, 55 best practices creating a destination snapclone before making a full copy, 53 minimizing simultaneous replication events , 54 support procedures, 53 three–site cascaded replication, 60 throttling merge I/O, 53 using a snapclone to move data, 58 using log files for troubl
fail back procedure, 38 to new storage hardware, 38 failed disk group, 41 failover AIX specifics, 48 choosing a procedure, 32 controller, 17 defined, 14, 16 fabric or path, 17 fail back to new storage hardware, 38 fail back to original source, 38 HP-UX specifics, 47 interfaces, 32 Linux specifics, 48 NetWare specifics, 48 OpenVMS specifics, 47 planned, 31, 33 site, 31 Solaris specifics, 49 testing, 29 Tru64 specifics, 47 unplanned, 31, 35 versus component repair, 31 VMware specifics, 50 Windows specifics, 5
setup, 29 synchronizing, 29 zoning, 24 merging, 53 and I/O performance, 53 throttling merge I/O, 36, 37 mode failsafe, 18 multipath software requirement, 26 synchronous, 13 topology, 23 Replication Solutions Manager interface managed sets, 16 Resume command, 54 S setting read-only access for a destination LUN, 64 setting up management servers, 29 setting up HP Continuous Access EVA, 21 snapclone creating before full copy, 53 data movement, 58 three–site cascaded replication, 60 Solaris procedure to detec
U unknown state DR groups, 64 unplanned failover defined, 31 procedure, 35 upgrading controller software, 58 V virtual disk creating, 27 preferred path, 27 presenting, 27, 28 VMware 74 procedure to detect devices, 50 W web sites HP documentation, 11 HP storage, 11 HP Subscriber's choice, 11 Windows procedure to detect devices, 50 Z zones creating, 24