Building Disaster Recovery Serviceguard Solutions Using Metrocluster with 3PAR Remote Copy for Linux B.12.00.
Legal Notices © Copyright 2014 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
Contents 1 Introduction...............................................................................................5 Overview of HP 3PAR concepts..................................................................................................5 Remote Copy pairs .............................................................................................................5 Remote Copy Volume Groups ...............................................................................................
6 Troubleshooting........................................................................................40 Troubleshooting Metrocluster....................................................................................................40 Metrocluster log.................................................................................................................40 HP 3PAR storage system log................................................................................................
1 Introduction This document describes the procedure to configure data replication solutions using HP 3PAR storage systems to provide disaster recovery for Serviceguard clusters over long distances. This chapter describes the HP 3PAR Remote Copy software and the additional files that integrate the HP 3PAR storage system with Metrocluster. Overview of HP 3PAR concepts The 3PAR storage systems are configured for use in data replication from one 3PAR storage system unit to another.
synchronization is manually initiated. If between two synchronizations, an area of the volume is written to multiple times, only the last write needs to be synchronized with the other storage system. Remote Copy target definitions As part of the Remote Copy setup process, you must create target definitions on each Remote Copy system. The target definitions are descriptions that exist on one system to identify a Remote Copy system.
• Metrocluster uses 3PAR CLI to communicate to the storage array. Ensure that port 5783 is not blocked by firewall. • When you upgrade the HP 3PAR storage system to any of the following 3PAR OS versions, then it impacts the HP Metrocluster 3PAR package. ◦ 2.3.1 MU5 Patch35 ◦ 3.1.1 MU3 Patch27 ◦ 3.1.2 MU3 Patch16 After the upgrade to the patches is complete, a new self-signed 2048-bit RSA SSL Certificate is created on the HP 3PAR Array.
NOTE: The maximum number of CLI connections to a 3PAR storage array is 64. Overview of a Metrocluster configuration A Metrocluster is configured with the nodes at Site A and Site B. When Site A and Site B form a Metrocluster, a third location is required where Quorum Server or arbitrator nodes must be configured. There is a 3PAR storage system at each site and they are connected to each other through Remote Copy links. An application is deployed in a Metrocluster by configuring it at both the sites.
Overview of Metrocluster 3PAR Remote Copy Volume Group monitor In a Metrocluster environment it is necessary to actively monitor the Remote Copy Volume Group state. If this is not done, it becomes difficult to find out the time when the application data became remotely unprotected for an extended period of time. Under such circumstances, the Metrocluster 3PAR Remote Copy Volume Group monitor provides the capability to monitor the status of the Remote Copy Volume Group used in a package.
2 Configuring an application in a Metrocluster environment Installing the necessary software Before a Metrocluster can be configured, make sure the following software is installed on all nodes: • Serviceguard for Linux A.12.00.00 or later • HP 3PAR InForm OS CLI • Metrocluster with 3PAR Remote Copy for Linux Creating the cluster NOTE: The file /etc/cmcluster.conf contains the mappings that resolve symbolic references to $SGCONF, $SGROOT, $SGLBIN, etc.
NODE_NAME SJC_2 SITE san_jose ........ Use cmviewcl command to view the list of sites that are configured in the cluster and their associated nodes. The following is a sample of the command, and the output: # cmviewcl -l node SITE_NAME san_francisco NODE STATUS STATE SFO_1 up running SFO_2 up running ......... SITE_NAME san_jose NODE STATUS STATE SJC_1 up running SJC_2 up running You can configure either of these failover policies for Metrocluster failover packages.
NOTE: When a TPVV is configured as a primary volume in a Remote Copy Volume Group, no data should be written on the secondary volume before adding it to the Remote Copy Volume Group, or it must match the primary volume. This enables the primary and secondary volumes to match during initial synchronization. The Remote Copy Volume Group can be created either using HP 3PAR Management Console GUI or using HP 3PAR CLI .
where: 4. ◦ -snp_cpg is the name of the copy group from which the snapshot space is allocated. You can use CPG that was created in step2 or any other CPG to allocate the space for the snapshot. ◦ -usr_aw is the allocation warning alert limit for the user space specified in percentage. This generates an alert when the user space of the volume exceeds a specified percentage of the volume’s size.
• is the name of the virtual volume created in step 3. • is the type of replication to be used. For synchronous replication use sync and for periodic asynchronous replication use periodic. HP also recommends you to set the following auto_recover policy for the Remote Copy Volume Group: cli% setrcopygroup pol auto_recover • b.
where: • is the name of the domain to which the new user will belong. If you are using domains, specify the name of an existing domain in your system. Specify 'all' as the domain name if you are not using any domain. NOTE: For Metrocluster operations the user must have the “edit” privileges. HP strongly recommends you to create Remote Copy Volume Groups and user with “edit” privileges for the Metrocluster operations under a 3PAR storage system domain.
# pvcreate -f /dev/sda1 2. Create the volume group on the source volume. # vgcreate --addtag $(uname -n) /dev/ /dev/sda1 # vgcreate --addtag $(uname -n) /dev/ 3. Create the logical volume. (XXXX indicates size in MB). # lvcreate -L XXXX /dev/ 4. Create a file system on the logical volume. # mke2fs -j /dev//lvol1 5. If required, deactivate the volume groups on the primary system and remove the tag. # vgchange -a n # vgchange --deltag $(uname -n) 6.
Creating VxVM disk groups If you are using VERITAS storage, use the following procedure to create disk groups. The following section explains how to set up the VERITAS disk groups. On one node in the source disk site do the following: 1. Run the vxdisksetup command on the primary system to initialize the disks to be used with VxVM. # /etc/vx/bin/vxdisksetup -i 2. Create a disk group to be used with the vxdg command on the primary system. # vxdg init 3.
6. Verify if the file system is present, and then unmount the file system. # umount / 7. Deport the disk group. # vxdg deport Repeat steps 2 through 7 on all nodes in the cluster that require access to this disk group. 8. Login to the source disk site's 3PAR storage system. Reverse the direction of replication to bring it back to its original direction.
a. Specify the package directory for the dts/dts/dts_pkg_dir attribute. dts/3parrc/dts_pkg_dir $SGCONF/pkg b. Specify the DC1 nodes for the DC1_NODE_LIST parameter. Multiple names are defined using a space as a separator between the names. You cannot mention the IP addresses to the DC1_NODE_LIST parameter. dts/3parrc/DC1_NODE_LIST “dc1_node1 dc1_node2” c. Specify the DC2 nodes for the DC2_NODE_LIST parameter. Multiple names are defined using a space as a separator between the names.
NOTE: For steps f through k, use HP 3PAR Management Console/HP 3PAR CLI to identify the values for configuring Metrocluster with 3PAR Remote Copy for Linux attributes. l. Specify the timeout, in minutes, to wait for completion of the Remote Copy Volume Group resynchronization from source to destination volume before starting up the package on the destination. dts/3parrc/RESYNC_WAIT_TIMEOUT 5 The legal values for this parameter are “0” (default value) or “no_timeout” or value greater than “0”.
service_cmd "$SGSBIN/DR3PARRCMon $SG_PACKAGE_NAME" service_restart unlimited • To receive notifications about the status change of the Remote Copy Volume Groups, specify an appropriate value for the DR_NOTIFICATION_CHOICES attribute by specifying one or more notification types. The DR_NOTIFICATION_CHOICES attribute supports the following types of notifications: ◦ EMAIL NOTE: For email notifications, you must include sg/email module as part of the package.
With the online modification of the packages, the following are the constraints for the listed attributes: • DC1_STORAGE_SYSTEM_NAME, DC2_STORAGE_SYSTEM_NAME You can change the storage system names for the attribute, but changing the array is not supported. • DC1_STORAGE_SYSTEM_USER, DC2_STORAGE_SYSTEM_USER You can change the user name, but you must recreate the password files for the newly created user name.
Table 2 Identification parameters Parameter name Description Valid values Package Name Any name that is unique among package names in the selected cluster. Any name, up to a maximum of 39 characters, that: • Starts and ends with an alphanumeric character • Otherwise contains only alphanumeric characters or dot (.) or dash (-) or an underscore (_) Package Description A brief description of the application managed by the package. Can contain a maximum of 80 characters.
Figure 4 Select Replication Type 4. Enter the 3PAR parameters in the 3PAR Replication Parameters section. Table 3 (page 24) describes 3PAR replication parameters. Table 3 3PAR replication parameters Parameter name Description Metrocluster Package Configuration Directory The directory is checked for presence of FORCEFLAG, which is used at time where package is not allowed to startup automatically, but you want to start the package forcibly after understanding the risks.
Table 3 3PAR replication parameters (continued) Parameter name Description Remote Copy Volume Group The Remote Copy volume group name configured on the 3PAR storage system, containing the disks used by the application. NOTE: Data Center 2 remote copy volume group must not be same as Data Center 1. Remote Copy Target The Remote Copy target name defined on the 3PAR storage system in Data Center1 for the 3PAR storage system in Data Center 2.
Table 4 Package behavior parameters (continued) Parameter name Description Valid values Default value: No Resynchronization Wait Timeout This parameter defines the wait timeout for Remote Copy volume group resynchronization. It specifies the timeout in minutes to wait for completion of the Remote Copy volume group resynchronization from source to destination volume before starting up the package on the destination. Zero or value > zero or no timeout.
Figure 7 Setting Notifications 7. If you have selected Email Notification then to receive mail notifications about the package status, enter an email address in Event Notification text box and click Add. Figure 8 Adding Email Address 8. Once you choose to set the notification as Others and set an appropriate value, a service gets added to the package. Go to Resource Parameters → Services and Scripts to see the added service.
Figure 9 Services 9. Click Create to create the package and return to the Packages screen. • If the package is created successfully, the left pane of the Packages screen lists the new package created. • If the package is not created, relevant error messages appear at the bottom of the Create Package screen. Fix the errors and then click the Create or click Cancel to start from the beginning.
3 Metrocluster Features Cluster verification Starting HP Serviceguard version A.12.00.00, the cmcheckconf -v command validates the cluster and the package configuration. Metrocluster uses this functionality to ensure the sanity of Metrocluster package configuration. HP recommends that you set up a cron job to regularly run the cmcheckconf command. For more information on cmcheckconf command, see the cmcheckconf (1m) manpage.
Table 5 Validating Metrocluster package (continued) # cmcheckconf -p Verify the replication status of the cmcheckconf [–v] Remote Copy Volume Group based on # cmcheckconf the package configuration file. When the package is up and running, the cmcheckconf command verifies the replication status. If the replication is not happening, the validation script displays the following warning message. WARNING: The replication is not happening.
Table 5 Validating Metrocluster package (continued) ERROR: DC2_RC_VOLUME_GROUP ${RC3PAR_DC2_RC_VOLUME_GROUP} is not the name of Remote Copy Volume Group ${RC3PAR_DC1_RC_VOLUME_GROUP} on the storage system ${RC3PAR_DC2_STORAGE_SYSTEM_NAME}. Verify whether the Remote Copy target cmcheckconf [–v] names matches with the actual Remote # cmcheckconf Copy target of the replication group as mentioned in the package configuration file.
4 Understanding failover/failback scenarios Failover/failback scenarios in a Metrocluster package The section describes a couple of rolling disaster scenarios. In the first scenario, the link had gone down previously and is now up. The data from primary volume group is being synced with remote Remote Copy Volume Group. The package has failed in the primary site and is now trying to start at the recovery site.
Table 6 Replication modes and failover scenarios (continued) Local RCVG Role Remote RCVG Role Replication State/Link Status Replication Mode Metrocluster Parameters Metrocluster Action happening from the remote storage system to the local storage system. Start the Remote Copy Volume Group manually before restarting the package.
Table 6 Replication modes and failover scenarios (continued) Local RCVG Role Remote RCVG Role Replication State/Link Status Replication Mode Metrocluster Parameters Metrocluster Action Error: The Remote Copy link is down and data in the local storage system may not be current. The user has set AUTO_NONCURDATA to "0” and has not created the FORCEFLAG file. To start the package forcefully using non current data, use FORCEFLAG file. Resolution in the location specified in the DTS_PKG_DIR parameter.
5 Administering a Metrocluster with 3PAR Remote Copy for Linux Administering a cluster that uses Metrocluster 3PAR Remote Copy While the package is running, a manual storage failover on Remote Copy Volume Group outside of Metrocluster software can cause the package to halt due to unexpected condition of the 3PAR Remote Copy virtual volumes. HP recommends that no manual storage failover be performed while the package is running.
Restoring replication after a failover When the Metrocluster package fails over to the remote site and the links are not up or the primary storage system is not up, Metrocluster issues the setrcopygroup failover command. This command changes the role of the Remote Copy Volume Group on the storage system in the recovery site from Secondary to Primary-Rev. In this role, the data is not replicated from the recovery site to the primary site.
1. From the Main menu, select Packages. • 2. The Packages screen is displayed. From the left pane, select the package you want to start. • The overview page for the selected package is displayed in the right pane. NOTE: icon. 3. You can only start a package whose status is down, which is indicated by a red From the Actions drop-down, select Advanced Run. • The Advanced Run screen is displayed. Here, select the nodes on which you want to run the package.
3. From the Actions drop-down, select Halt. • If the package halts, a message Successfully halted package on the node appears at the top of the screen. The package status changes to down, as indicated by red icon. • If the package fails to halt, a message Unable to halt package appears at the top of the screen. Using Advanced Halt option To halt a package follow these steps: 1. From the Main menu, select Packages. • 2. The Packages screen is displayed.
Figure 12 Halt and Advance Halt option Rolling upgrade Metrocluster configurations follow the HP Serviceguard rolling upgrade procedure. The HP Serviceguard documentation includes rolling upgrade procedures to upgrade the Serviceguard version, the operating environment, and other software. This Serviceguard procedure, along with recommendations, guidelines, and limitations, is applicable to Metrocluster.
6 Troubleshooting Troubleshooting Metrocluster To troubleshoot problems with Metrocluster with 3PAR Remote Copy for Linux, you must understand HP 3PAR Remote Copy environments. See the Remote Copy User Guide for more information on Remote Copy configuration and volume group states.
Table 7 Error Messages and their Resolution (continued) Log Messages Cause file. The package is not allowed to start up. To start the package forcefully using non current data, use FORCEFLAG file. The Remote Copy Volume Group is in "Syncing" state and RESYNC_WAIT_TIMEOUT parameter is set to 0. The package is not allowed to start up. Resolution latest data in the local storage system. Restart the package.
Table 7 Error Messages and their Resolution (continued) Log Messages Cause Resolution storage system for the corresponding user. • The CLI connections are exhausted. For more information see “Managing CLI connections to 3PAR array” (page 44). 42 Not able to determine the status of the remote storage system. This might be because of CLI connectivity issues or because the remote storage system is down. The role of local Remote Copy Volume Group's is “Primary”.
Table 7 Error Messages and their Resolution (continued) Log Messages Cause Resolution Starting of the Remote Copy Volume Group The startrcopygroup command failed. Start the replication group using either the startrcopygroup command or the 3PAR Management Console. Restart the package. [] has failed to complete. This means that Remote Copy is not functioning between the primary and secondary volume groups.
Table 7 Error Messages and their Resolution (continued) Log Messages Cause Resolution NOTE: The password file must have the following format: <3parArrayUserName>_<3parArrayName>.pwf Managing CLI connections to 3PAR array The maximum number of CLI connections to a 3PAR storage array is 64. Metrocluster configuration or package startup operations uses CLI to connect and to get information about the remote copy groups from the storage array.
A Checklist and worksheet for configuring Metrocluster with 3PAR Remote Copy for Linux Disaster recovery checklist Use this checklist to make sure you have adhered to the disaster tolerant architecture guidelines for two main data centers and a third location configuration. Data centers A and B have the same number of nodes to maintain quorum in case an entire data center fails. Arbitrary nodes or Quorum Server nodes are located in a separate location from either of the primary data centers (A or B).
Member Timeout: _________________________________________________________ Network Polling Interval: _______________________________________________ AutoStart Delay: ________________________________________________________ Package configuration worksheet Use this package configuration worksheet either in place of, or in addition to the worksheet provided in the latest version of the Managing HP Serviceguard A.12.00.00 for Linux manual available at http://www.hp.com/go/linux-serviceguard-docs .
DC1 DC2 DC2 DC2 DC2 RC Target for DC2: ___________________________________________________ RC Volume Group: _____________________________________________________ Storage System User: _________________________________________________ Nodes List: __________________________________________________________ RC Target for DC1: ___________________________________________________ Package configuration worksheet 47
Glossary A—C 3PAR Remote Copy The 3PAR storage systems are configured for use in data replication from one 3PAR storage system unit to another. This type of physical data replication is a part of the Metrocluster with 3PAR Remote Copy for Linux. arbitrator Nodes in a disaster tolerant architecture that act as tie-breakers in case all of the nodes in a data center go down at the same time. These nodes are full members of the Serviceguard cluster and must conform to the minimum requirements.
LUN (Logical Unit Number) A SCSI term that refers to a logical disk device composed of one or more physical disk mechanisms, typically configured into a RAID level. M, N manual failover Failover requiring human intervention to start an application or service on another node. Metrocluster A Hewlett-Packard product that allows a customer to configure an Serviceguard cluster as a disaster recovery metropolitan cluster.
Index Symbols M 3PAR storage systems Remote Copy pair, 5 Metrocluster 3PAR Remote Copy, 5 configuration, 8 requirements, 6 A arrays cabled, 11 asynchronous, 5 P bidirectional configuration, 6 parameters, 34 password HP 3PAR storage systems, 15 privileges configure, 15 C R Cluster verification, 29 cluster multiple sites, 10 cmviewcl command, 11 command, 12 Common Provisioning Group (CPG) Remote Copy, 5 Virtual domains, 6 configuration environment, 8 Console, 14 console GUI, 12 RAID virtual, 11 Rem