HP XP P9000 Continuous Access Synchronous for Mainframe Systems User Guide

10 Disaster recovery
This chapter discusses disaster recovery.
Disaster recovery overview
Preparing for disaster recovery involves the following major steps:
1. Identify the volumes and groups that contain important files and data for disaster recovery.
2. Create Continuous Access Synchronous Z pairs, paying special attention to the options in
P-VOL Fence Level Settings to ensure that the system responds the way you want in the event
of a failure (see “Fence Level options for I/O to the P-VOL after suspension” (page 36) ).
3. Install and configure host failover software between the primary and secondary sites.
4. Establish file and database recovery procedures. These procedures for recovering volumes
due to control unit failure should already be in place.
5. Make sure that the host system at the primary site is configured to receive sense information
from the XP P9500 main system (for example, using SNMP). This should also be done at the
remote or secondary site if a host is connected to it.
Remote copy and disaster recovery procedures are inherently complex. Consult your HP account
team on sense-level settings and recovery procedures.
Sense information shared between sites
When the main system suspends a Continuous Access Synchronous Z pair due to an error condition,
the main and remote system send sense information with unit check status to the appropriate hosts.
This is used during disaster recovery to determine the currency of the S-VOL, and must be transferred
to the remote or secondary site via the host failover software.
If the host system supports IBM PPRC and receives PPRC compatible sense information related to
a Continuous Access Synchronous Z pair, the host OS will perform the following actions:
1. Temporarily suspend all application I/O operations to the P-VOL.
2. Enter an IEA491E message in the system log (SYSLOG) that indicates the time that the P-VOL
was suspended. Make sure that the system log is common to both the main and remote
operating systems.
3. Place specific information about the failure (SIM) in the SYS1.LOGREC dataset for use by
service personnel. See “Service Information Messages (SIMs)” (page 102) for more information
about SIM.
4. Wait for the IEA491E message to reach the remote system.
5. Resume all host application I/O operations to the P-VOL. If the P-VOL fence level setting does
not allow subsequent updates, the MCU will return a unit check for all subsequent write I/O
operations, and the application will terminate.
Make sure that the MCUs and RCUs are configured to report the service level SIMs to the host.
Select the Services SIM of Remote Copy = Report setting on the CU Option dialog box.
File and database recovery
File recovery procedures for disaster recovery should be the same as those used for recovering a
data volume that becomes inaccessible due to control unit failure.
Continuous Access Synchronous Z does not provide a procedure for detecting and retrieving lost
updates. To detect and recreate lost updates, you must check other current information (for example,
database log file) that was active at the main system when the disaster occurred.
The detection and retrieval process can take some time. Your disaster recovery scenario should
be designed so that detection and retrieval of lost updates is performed after the application has
been started at the remote or secondary site.
104 Disaster recovery