CICS Transaction Server for z/OS Version 4 Release 1 Recovery and Restart Guide SC34-7012-01
CICS Transaction Server for z/OS Version 4 Release 1 Recovery and Restart Guide SC34-7012-01
Note Before using this information and the product it supports, read the information in “Notices” on page 243. This edition applies to Version 4 Release 1 of CICS Transaction Server for z/OS (product number 5655-S97) and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright IBM Corporation 1982, 2010. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents Preface . . . . . . . . . . . . . . vii What this book is about . . . . . Who should read this book . . . . What you need to know to understand How to use this book . . . . . . . . . . . . . . this book . . . . . vii . vii vii . vii Changes in CICS Transaction Server for z/OS, Version 4 Release 1 . . . . . . . ix Part 1. CICS recovery and restart concepts . . . . . . . . . . . . . . 1 Chapter 1. Recovery and restart facilities 3 Maintaining the integrity of data . . . . . . .
Journal names and journal models. . Terminal control resources . . . . Distributed transaction resources . . URIMAP definitions and virtual hosts . . . . . . . . Chapter 6. CICS emergency restart . . . . . . . . 58 58 59 59 . . 61 Recovering after a CICS failure . . . . . . . . Recovering information from the system log . . Driving backout processing for in-flight units of work . . . . . . . . . . . . . . . Concurrent processing of new work and backout Other backout processing . . . . . . . . .
Input extrapartition data sets . . . . . . Output extrapartition data sets . . . . . Using post-initialization (PLTPI) programs . Recovery for temporary storage . . . . . . Backward recovery . . . . . . . . . Forward recovery . . . . . . . . . . Recovery for Web services . . . . . . . . Configuring CICS to support persistent messages . . . . . . . . . . . . . Defining local queues in a service provider . Persistent message processing . . . . . . . . . . . . . 134 135 135 135 135 136 136 . 136 . 137 .
Forward recovery logging . . . . Forward recovery . . . . . . . Recovering VSAM spheres with AIXs An assembler program that calls DFSMS services . . . . . . . . . . . . . . . . . . . . callable . . . . 215 . 216 . 217 . 218 Chapter 19. Disaster recovery . . . . 223 Why have a disaster recovery plan? . . . . . Disaster recovery testing . . . . . . . . Six tiers of solutions for off-site recovery . . . Tier 0: no off-site data . . . . . . . . Tier 1 - physical removal . . . . . . .
Preface What this book is about This book contains guidance about determining your CICS® recovery and restart needs, deciding which CICS facilities are most appropriate, and implementing your design in a CICS region. The information in this book is generally restricted to a single CICS region. For information about interconnected CICS regions, see the CICS Intercommunication Guide. This manual does not describe recovery and restart for the CICS front end programming interface.
viii CICS TS for z/OS 4.
Changes in CICS Transaction Server for z/OS, Version 4 Release 1 For information about changes that have been made in this release, please refer to What's New in the information center, or the following publications: v CICS Transaction Server for z/OS What's New v CICS Transaction Server for z/OS Upgrading from CICS TS Version 3.2 v CICS Transaction Server for z/OS Upgrading from CICS TS Version 3.1 v CICS Transaction Server for z/OS Upgrading from CICS TS Version 2.
x CICS TS for z/OS 4.
Part 1. CICS recovery and restart concepts It is very important that a transaction processing system such as CICS can restart and recover following a failure. This section describes some of the basic concepts of the recovery and restart facilities provided by CICS. © Copyright IBM Corp.
2 CICS TS for z/OS 4.
Chapter 1. Recovery and restart facilities Problems that occur in a data processing system could be failures with communication protocols, data sets, programs, or hardware. These problems are potentially more severe in online systems than in batch systems, because the data is processed in an unpredictable sequence from many different sources. Online applications therefore require a system with special mechanisms for recovery and restart that batch systems do not require.
In general, forward recovery is applicable to data set failures, or failures in similar data resources, which cause data to become unusable because it has been corrupted or because the physical storage medium has been damaged. Minimizing the effect of failures An online system should limit the effect of any failure. Where possible, a failure that affects only one user, one application, or one data set should not halt the entire system.
Another way is to shut down CICS with an immediate shutdown and perform the forward recovery, after which a CICS emergency restart performs the backward recovery. Recoverable resources In CICS, a recoverable resource is any resource with recorded recovery information that can be recovered by backout.
v In the event of an emergency restart, when CICS backs out all those transactions that were in-flight at the time of the CICS failure (emergency restart backout). Although these occur in different situations, CICS uses the same backout process in each case. CICS does not distinguish between dynamic backout and emergency restart backout.
The recovery manager also drives: v The backout processing for any units of work that were in a backout-failed state at the time of the CICS failure v The commit processing for any units of work that had not finished commit processing at the time of failure (for example, for resource definitions that were being installed when CICS failed) v The commit processing for any units of work that were in a commit-failed state at the time of the CICS failure See “Unit of work recovery” on page 73 for an explanation
Forward recovery journal names are of the form DFHJnn where nn is a number in the range 1–99 and is obtained from the forward recovery log id (FWDRECOVLOG) in the FILE resource definition. In this case, CICS creates a journal entry for the forward recovery log, which can be mapped by a JOURNALMODEL resource definition. Although this method enables user application programs to reference the log, and write user journal records to it, you are recommended not to do so.
2. If the failure occurs during the execution of a CICS syncpoint, where the conversation is with another resource manager (perhaps in another CICS region), CICS handles the resynchronization. This is described in the CICS Intercommunication Guide. If the link fails and is later reestablished, CICS and its partners use the SNA set-and-test-sequence-numbers (STSN) command to find out what they were doing (backout or commit) at the time of link failure.
When the operator replies to IXC402D, the CICS interregion communication program, DFHIRP, is notified and the suspended tasks are abended, and MRO connections closed. Until the reply is issued to IXC402D, an INQUIRE CONNECTION command continues to show connections to regions in the failed MVS as in service and normal. When the failed MVS image and its CICS regions are restarted, the interregion communication links are reopened automatically.
The CICS recovery manager then uses the information retrieved from the system log to: v Back out recoverable resources. v Recover changes to terminal resource definitions. (All resource definitions installed at the time of the CICS failure are initially restored from the CICS global catalog.) A special case of CICS processing following a system failure is covered in Chapter 6, “CICS emergency restart,” on page 61. Chapter 1.
12 CICS TS for z/OS 4.
Chapter 2. Resource recovery in CICS Before you begin to plan and implement resource recovery in CICS, you should understand the concepts involved, including units of work, logging and journaling. Units of work When resources are being changed, there comes a point when the changes are complete and do not need backout if a failure occurs later. The period between the start of a particular set of changes and the point at which they are complete is called a unit of work (UOW).
v v v v Working storage Any LU6.2 sessions Any LU6.1 links Any MRO links The resources CICS retains include: v Locks on recoverable data. If the unit of work is shunted indoubt, all locks are retained. If it is shunted because of a commit- or backout-failure, only the locks on the failed resources are retained. v System log records, which include: – Records written by the resource managers, which they need to perform recovery in the event of transaction or CICS failures.
When a lock is first acquired, it is an active lock. It remains an active lock until successful completion of the unit of work, when it is released, or is converted into a retained lock if the unit of work fails, or for a CICS or SMSVSAM failure: v If a unit of work fails, RLS VSAM or the CICS enqueue domain continues to hold the record locks that were owned by the failed unit of work for recoverable data sets, but converted into retained locks.
– EXEC CICS CREATE CONNECTION COMPLETE – EXEC CICS DISCARD CONNECTION – EXEC CICS DISCARD TERMINAL A UOW that does not change a recoverable resource has no meaningful effect for the CICS recovery mechanisms. Nonrecoverable resources are never backed out.
X . . . . EOT . (SP). . . . Commit. Mod . . . Backout . ===========. . UOW 3 . . Unit of work Task A SOT Mod UOW 1 UOW 2 UOW 4 Task B SOT SP . . SP . Mod Mod . 2 3 . . . Commit .Commit Mod 2 .Mod 3 . . Backout . =======================. . SP Mod 1 Commit Mod 1 EOT (SP) Mod 4 Commit Mod 4 Task C SOT Mod Abbreviations: EOT: End of task UOW: Unit of work Mod: Modification to database SOT: Start of task SP: Syncpoint X: Moment of system failure . . . Mod . . . . . .
v Managing the state, and controlling the execution, of each UOW v Coordinating UOW-related changes during syncpoint processing for recoverable resources v Coordinating UOW-related changes during restart processing for recoverable resources v Coordinating recoverable conversations to remote nodes v Temporarily suspending completion (shunting), and later resuming completion (unshunting), of UOWs that cannot immediately complete commit or backout processing because the required resources are unavailable, beca
v Notification that the resource is not available, requiring temporary suspension (shunting) of the UOW v Notification that the resource is available, enabling retry of shunted UOWs v Notification that a connection is reestablished, and can deliver a commit or rollback (backout) decision v Syncpoint rollback v Normal termination of the UOW The identity of a UOW and its state are owned by the CICS recovery manager, and are recorded in storage and on the system log.
others. This can happen, for example, if two data sets are updated and the UOW has to be backed out, and the following happens: v One resource backs out successfully v While committing this successful backout, the commit fails v The other resource fails to back out These events leave one data set commit-failed, and the other backout-failed. In this situation, the overall status of the UOW is logged as backout-failed.
Resynchronization after system or connection failure Units of work that fail while in an indoubt state remain shunted until the indoubt state can be resolved following successful resynchronization with the coordinator. Resynchronization takes place automatically when communications are next established between subordinate and coordinator. Any decisions held by the coordinator are passed to the subordinate, and indoubt units of work complete normally.
CICS also writes “backout-failed” records to the system log if a failure occurs in backout processing of a VSAM data set during dynamic backout or emergency restart backout. Records on the system log are used for cold, warm, and emergency restarts of a CICS region. The only type of start for which the system log records are not used is an initial start.
v User journaling is entirely under your application programs’ control. You write records for your own purpose using EXEC CICS WRITE JOURNALNAME commands. See “Flushing journal buffers” on page 28 for information about CICS shutdown considerations. v Automatic journaling means that CICS automatically writes records to a log stream, referenced by the journal name specified in a journal model definition, as a result of: – Records read from or written to files.
24 CICS TS for z/OS 4.
Chapter 3. Shutdown and restart recovery CICS can shut down normally or abnormally and this affects the way that CICS restarts after it shuts down.
v The DFHCESD program started by the CICS-supplied transaction, CESD, attempts to purge and back out long-running tasks using increasingly stronger methods (see “The shutdown assist transaction” on page 30). v Tasks that are automatically initiated are run—if they start before the second quiesce stage. v Any programs listed in the first part of the shutdown program list table (PLT) are run sequentially.
this indicator to determine the type of startup it is to perform. See “How the state of the CICS region is reconstructed” on page 34. v CICS writes warm keypoint records to: – The global catalog for terminal control and profiles – The CICS system log for all other resources. See “Warm keypoints.” v CICS deletes all completed units of work (log tail deletion), leaving only shunted units of work and the warm keypoint.
Flushing journal buffers During a successful normal shutdown, CICS calls the log manager domain to flush all journal buffers, ensuring that all journal records are written to their corresponding MVS system logger log streams. During an immediate shutdown, the call to the log manager domain is bypassed and journal records are not flushed. This also applies to an immediate shutdown that is initiated by the shutdown-assist transaction because a normal shutdown has stalled.
2. If the default shutdown assist transaction CESD is run, it allows as many tasks as possible to commit or back out cleanly, but within a shorter time than that allowed on a normal shutdown. See “The shutdown assist transaction” on page 30 for more information about CESD, which runs the CICS-supplied program DFHCESD. 3. None of the programs listed in the shutdown PLT is executed. 4. CICS does not write a warm keypoint or a warm-start-possible indicator to the global catalog. 5.
The next initialization of CICS must be an emergency restart, in order to preserve data integrity. An emergency restart is ensured if the next initialization of CICS specifies START=AUTO. This is because the recovery manager’s type-of-restart indicator is set to “emergency-restart-needed” during initialization, and is not reset in the event of an immediate or uncontrolled shutdown.
You are recommended always to use the CESD shutdown-assist transaction when shutting down your CICS regions. You can use the DFHCESD program “as is”, or use the supplied source code as the basis for your own customized version (CICS supplies versions in assembler, COBOL, and PL/I). For more information about the operation of the CICS-supplied shutdown assist program, see the CICS Operations and Utilities Guide.
– – – – – – – – - File control recovery blocks (only if a SHCDS NONRLSUPDATEPERMITTED command has been used).
If you ever need to redefine and reinitialize the CICS local catalog, you should also reinitialize the global catalog. After reinitializing both catalog data sets, you must perform an initial start. Shutdown initiated by CICS log manager The CICS log manager initiates a shutdown of the region if it encounters an error in the system log that indicates previously logged data has been lost.
and therefore recovery of the most recent units of work cannot be carried out. However, data might be missing from any part of the system log and CICS cannot identify what is missing. CICS cannot examine the log and determine exactly what data is missing, because the log data might appear consistent in itself even when CICS has detected that some data is missing.
Overriding the type of start indicator The operation of the recovery manager's control record can be modified by running the recovery manager utility program, DFHRMUTL. About this task This can set an autostart record that determines the type of start CICS is to perform, effectively overriding the type of start indicator in the control record. See the CICS Operations and Utilities Guide for information about using DFHRMUTL to modify the type of start performed by START=AUTO.
performs the recovery process for work that was in-flight when the previous run of CICS was abnormally terminated. Recovery of data during an emergency restart During the final stage of emergency restart, the recovery manager uses the system log data to drive backout processing for any units of work that were in-flight at the time of the failure.
You can do this by specifying START=INITIAL as a system initialization parameter, or by running the recovery manager's utility program (DFHRMUTL) to override the type of start indicator to force an initial start. See the CICS Operations and Utilities Guide for information about the DFHRMUTL utility program. Dynamic RLS restart If a CICS region is connected to an SMSVSAM server when the server fails, CICS continues running, and recovers using a process known as dynamic RLS restart.
Recovery with VTAM persistent sessions With VTAM persistent sessions support, if CICS fails or undergoes immediate shutdown (by means of a PERFORM SHUTDOWN IMMEDIATE command), VTAM holds the CICS LU-LU sessions in recovery-pending state, and they can be recovered during startup by a newly starting CICS region. With multinode persistent sessions support, sessions can also be recovered if VTAM or z/OS® fails in a sysplex.
During an emergency restart of CICS, CICS restores those sessions pending recovery from the CICS global catalog and the CICS system log to an in-session state. This process of persistent sessions recovery takes place when CICS opens its VTAM ACB. With multinode persistent sessions support, if VTAM or z/OS fails, sessions are restored when CICS reopens its VTAM ACB, either automatically by the COVR transaction, or by a CEMT or EXEC CICS SET VTAM OPEN command.
v If CICS determines that it cannot recover the session without unbinding and rebinding it. The result in each case is as if CICS has restarted following a failure without VTAM persistent sessions support. In some other situations APPC sessions are unbound. For example, if a bind was in progress at the time of the failure, sessions are unbound.
| | | You can then start further CICS regions with or without persistent sessions support as appropriate, provided that you do not exceed the limit for the number of regions that do have persistent sessions support. | | | If you specify NOPS (no persistent session support) for the PSTYPE system initialization parameter, a zero value is required for the PSDINT (persistent session delay interval) system initialization parameter.
42 CICS TS for z/OS 4.
Part 2. Recovery and restart processes You can add your own processing to the CICS recovery and restart processes. This part contains the following sections: v v v v v v Chapter 4, Chapter 5, Chapter 6, Chapter 7, Chapter 8, Chapter 9, © Copyright IBM Corp.
44 CICS TS for z/OS 4.
Chapter 4. CICS cold start This section describes the CICS startup processing specific to a cold start.
– CICS requests the SMSVSAM server, if connected, to release all RLS retained locks. – CICS does not rebuild the non-RLS retained locks. v CICS requests the SMSVSAM server to clear the RLS sharing control status for the region. v CICS does not restore the dump table, which may contain entries controlling system and transaction dumps.
specify on the GRPLIST system initialization parameter. The CSD file definition is built and installed from the CSDxxxx system initialization parameters. Data tables As for VSAM file definitions. BDAM File definitions are installed from file control table entries, specified by the FCT system initialization parameter. Attention: If you use the SHCDS REMOVESUBSYS command for a CICS region that uses RLS access mode, ensure that you perform a cold start the next time you start the CICS region.
Transient data resource definitions are installed from Resource groups defined in the CSD, as specified in the CSD group list (named on the GRPLIST system initialization parameter). Any extrapartition TD queues that require opening are opened; that is, any that specify OPEN(INITIAL). All the newly-installed TD queue definitions are written to the global catalog. All TD queues are installed as enabled.
If you define new resource definitions and install them dynamically, ensure the group containing the resources is added to the appropriate group list. Monitoring and statistics The initial status of CICS monitoring is determined by the monitoring system initialization parameters (MN and MNxxxx). The initial recording status for CICS statistics is determined by the statistics system initialization parameter (STATRCD).
Installable set install The following VTAM terminal control resources are committed in installable sets: v Connections and their associated sessions v Pipeline terminals—all the terminal definitions sharing the same POOL name If one definition in an installable set fails, the set fails. However, each installable set is treated independently within its CSD group. If an installable set fails as CICS installs the CSD group, it is removed from the set of successful installs.
information saved in the system log from a previous run. The primary and secondary system log streams are purged and CICS begins writing a new system log. v Because CICS is starting a new catalog, it uses a new logname token in the “exchange lognames” process when connecting to partner systems. Thus, remote systems are notified that CICS has performed a cold start and cannot resynchronize. v User journals are not affected by starting CICS with the START=INITIAL parameter.
52 CICS TS for z/OS 4.
Chapter 5. CICS warm restart This section describes the CICS startup processing specific to a warm restart. If you specify START=AUTO, which is the recommended method, CICS determines which type of start to perform using information retrieved from the recovery manager's control record in the global catalog. If the type-of-restart indicator in the control record indicates “warm start possible”, CICS performs a warm restart.
Files File control information from the previous run is recovered from information recorded in the CICS catalog only. File resource definitions for VSAM and BDAM files, data tables, and LSR pools are installed from the global catalog, including any definitions that were added dynamically during the previous run. The information recovered and reinstalled in this way reflects the state of all file resources at the previous shutdown.
Temporary storage Auxiliary temporary storage queue information (for both recoverable and non-recoverable queues) is retrieved from the warm keypoint. Note that TS READ pointers are recovered on a warm restart (which is not the case on an emergency restart). CICS opens the auxiliary temporary storage data set for update. Temporary storage data sharing server Any queues written to a shared temporary storage pool, even though non-recoverable, persist across a warm restart.
v All intrapartition TD queues are initialized empty. v The queue resource definitions are installed from the global catalog, but they are not updated by any log records or keypoint data. They are always installed enabled. This option is intended for use when initiating remote site recovery (see Chapter 6, “CICS emergency restart,” on page 61), but you can also use it for a normal warm restart.
Autoinstall for programs If program autoinstall is enabled (PGAIPGM=ACTIVE), program, mapset, and partitionset resource definitions are installed from the CSD only if they were cataloged; otherwise they are installed at first reference by the autoinstall process. All definitions installed from the CSD are updated with information from the warm keypoint in the system log.
Journal names and journal models The CICS log manager restores the journal name and journal model definitions from the global catalog. Journal name entries contain the names of the log streams used in the previous run, and the log manager reconnects to these during the warm restart.
v Different TCT from last run. CICS installs the TCT only, and does not apply the warm keypoint information, effectively making this a cold start for these devices. Note: CICS TS for z/OS, Version 4.1 supports only remote TCAM terminals—that is, the only TCAM terminals you can define are those attached to a remote, pre-CICS TS 3.1, terminal-owning region by TCAM/DCB.
60 CICS TS for z/OS 4.
Chapter 6. CICS emergency restart This section describes the CICS startup processing specific to an emergency restart. If you specify START=AUTO, CICS determines what type of start to perform using information retrieved from the recovery manager’s control record in the global catalog. If the type-of-restart indicator in the control record indicates “emergency restart needed”, CICS performs an emergency restart.
Any non-RLS locks associated with in-flight (and other failed) transactions are acquired as active locks for the tasks attached to perform the backouts. This means that, if any new transaction attempts to access non-RLS data that is locked by a backout task, it waits normally rather than receiving the LOCKED condition. Retained RLS locks are held by SMSVSAM, and these do not change while backout is being performed.
Reconnecting to SMSVSAM for RLS access As on a warm restart, CICS connects to the SMSVSAM server. In addition to notifying CICS about lost locks, VSAM also informs CICS of the units of work belonging to the CICS region for which it holds retained locks. See “Lost locks recovery” on page 89 for information about the lost locks recovery process for CICS. CICS uses the information it receives from SMSVSAM to eliminate orphan locks.
Start requests In general, start requests are recovered only when they are associated with recoverable data or are protected and the issuing unit of work is indoubt. However, recovery can be further limited by the use of the specific COLD option on the system initialization parameter for TS, ICP, or BMS. If you suppress start requests by means of the COLD option on the appropriate system initialization parameter, any data associated with the suppressed starts is discarded.
is successful, but CICS abnormally terminates before the catalog can be updated, CICS recovers the information from the forward recovery records on the system log. If the installation or deletion of installable sets or individual resources is unsuccessful, or has not reached commit point when CICS abnormally terminates, CICS does not recover the changes.
66 CICS TS for z/OS 4.
Chapter 7. Automatic restart management CICS uses the automatic restart manager (ARM) component of MVS to increase the availability of your systems.
If CICS is restarted by ARM with the same persistent JCL, CICS forces START=AUTO to ensure data integrity. Registering with ARM To register with ARM, you must implement automatic restart management on the MVS images that the CICS workload is to run on. You must also ensure that the CICS startup JCL used to restart a CICS region is suitable for ARM. Before you begin The implementation of ARM is part of setting up your MVS environment to support CICS.
CANCEL, CICS de-registers from ARM before terminating, because if CICS remained registered, an automatic restart would probably encounter the same error condition. For other error situations, CICS does not de-register, and automatic restarts follow. To control the number of restarts, specify in your ARM policy the number of times ARM is to restart a failed CICS region. Failing to register If ARM support is present but the register fails, CICS issues message DFHKE0401.
CICS START options You are recommended to specify START=AUTO, which causes a warm start after a normal shutdown and an emergency restart after failure. You are also recommended always to use the same JCL, even if it specifies START=COLD or START=INITIAL, to ensure that CICS restarts correctly when restarted by the MVS automatic restart manager after a failure.
The COVR transaction To ensure that CICS reconnects to VTAM in the event of a VTAM abend, CICS keeps retrying the OPEN VTAM ACB using a time-delay mechanism via the non-terminal transaction COVR. After CICS has completed clean-up following the VTAM failure, it invokes the CICS open VTAM retry (COVR) transaction. The COVR transaction invokes the terminal control open VTAM retry program, DFHZCOVR, which performs an OPEN VTAM retry loop with a 5-second wait.
You can also restart a server explicitly using either the server command CANCEL RESTART=YES, or the MVS command CANCEL jobname,ARMRESTART By default, the server uses an ARM element type of SYSCICSS, and an ARM element identifier of the form DFHxxnn_poolname where xx is the server type (XQ, CF or NC) and nn is the one- or two-character &SYSCLONE identifier of the MVS image. You can use these parameters to identify the servers for the purpose of overriding automatic restart options in the ARM policy.
Chapter 8. Unit of work recovery and abend processing A number of different events can cause the abnormal termination of transactions in CICS. These events include: v A transaction ABEND request issued by a CICS management module. v A program check or operating system abend (this is trapped by CICS and converted into an ASRA or ASRB transaction abend). v An ABEND request issued by a user application program. v A CEMT, or EXEC CICS, command such as SET TASK PURGE or FORCEPURGE.
See “Commit-failed recovery” on page 83. Backout-failed A unit of work fails while backing out updates to file control recoverable resources. (The concept of backout-failed applies in principle to any resource that performs backout recovery, but CICS file control is the only resource manager to provide backout failure support.) A partial copy of the unit of work is shunted to await retry of the backout process when the problem is resolved.
terminating transaction takes place immediately. Therefore, it does not cause any active locks to be converted into retained locks. In the case of a CICS region abend, in-flight tasks have to wait to be backed out when CICS is restarted, during which time the locks are retained to protect uncommitted resources.
Intrapartition transient data Intrapartition destinations specified as logically recoverable are restored by transaction backout. Read and write pointers are restored to what they were before the transaction failure occurred. Physically recoverable queues are recovered on warm and emergency restarts.
intended for the started task, but does not back out the START request itself. Thus the new task will start at its specified time, but the data will not be available to the started task, to which CICS will return a NOTFND condition in response to the RETRIEVE command. START with recoverable data (PROTECT) Transaction backout of the task issuing the START command causes the START request and the associated data to be backed out.
Table 1. Effect of RESTART option on started transactions (continued) Description of non-terminal START command Events Effect of RESTART(YES) Effect of RESTART(NO) Specifies nonrecoverable data Started task abends without retrieving its data Transaction is restarted START request and its data are with its data still available, up to n¹ discarded. times. Without data Started task abends Transaction is restarted — up to n¹ times.
Backout-failed recovery Backout failure support is currently provided only by CICS file control. If backout to a VSAM data set fails for any reason, CICS performs the following processing: v Invokes the backout failure global user exit program at XFCBFAIL, if this exit is enabled. If the user exit program chooses to bypass backout failure processing, the remaining actions below are not taken.
Transient data All updates to logically recoverable intrapartition queues are managed in main storage until syncpoint, or until a buffer must be flushed because all buffers are in use. TD always commits forwards; therefore, TD can never suffer a backout failure on DFHINTRA.
It might be worth initially deciding to leave a data set online for some time after a backout failure, to evaluate the level of impact the failures have on users. To recover from a media failure, re-create the data set by applying forward recovery logs to the latest backup. The steps you take depend on whether the data set is opened in RLS or non-RLS mode: v For data sets opened in non-RLS mode, set the data set offline to all CICS applications by closing all open files against the data set.
and issues a console message. If the failure has already been detected by some other (earlier) request, CICS has already started to close the SMSVSAM control ACB when the backout request fails. The backout is normally retried automatically when the SMSVSAM server becomes available. (See “Dynamic RLS restart” on page 37.) There is no need to take the data set offline. SMSVSAM server recycle during backout This error can occur only for VSAM data sets opened in RLS access mode.
This situation can be resolved only by deleting the rival record with the duplicate key value. Lock structure full error The backout required VSAM to acquire a lock for internal processing, but it was unable to do so because the RLS lock structure was full. This error can occur only for VSAM data sets opened in RLS access mode. To resolve the situation, you must allocate a larger lock structure in an available coupling facility, and rebuild the existing lock structure into the new one.
distinguishes between a commit failure where recoverable work was performed, and one for which only repeatable read locks were held. Indoubt failure recovery The CICS recovery manager is responsible for maintaining the state of each unit of work in a CICS region. For example, typical events that cause a change in the state of a unit of work are temporary suspension and resumption, receipt of syncpoint requests, and entry into the indoubt period during two-phase commit processing.
reads against VSAM data sets and has made no updates to other resources, it is safe to force the unit of work using the SET DSNAME or SET UOW commands. CICS saves enough information about the unit of work to allow it to be either committed or backed out when the indoubt unit of work is unshunted when the coordinator provides the resolution (or when the transaction wait time expires). This information includes the log records written by the unit of work.
To retrieve information about a unit of work (UOW), you can use either the CEMT, or EXEC CICS, INQUIRE UOW command. For the purposes of this illustration, the CEMT method is used. You can filter the command to show only UOWs that are associated with a particular transaction. For example, Figure 4 shows one UOW (AC0CD65E5D990800) associated with transaction UPDT.
When a UOW has been shunted indoubt, CICS retains locks on the recoverable resources that the UOW has updated. This prevents further tasks from changing the resource updates while they are indoubt. To display CICS locks held by a UOW that has been shunted indoubt, use the CEMT INQUIRE UOWENQ command. You can filter the command to show only locks that are associated with a particular UOW.
Recovery from failures associated with the coupling facility This topic deals with recovery from failures arising from the use of the coupling facility, and which affect CICS units of work.
CICS recovers after a cache failure automatically. There is no need for manual intervention (other than the prerequisite action of resolving the underlying cause of the cache failure). Lost locks recovery The failure of a coupling facility lock structure that cannot be rebuilt by VSAM creates the lost locks condition. The lost locks condition can occur only for data sets opened in RLS mode.
region that was not sharing the data set at the time the lost locks condition occurred, and on RLS access requests issued by any new units of work in CICS regions that were sharing the data set. Performing lost locks recovery for failed units of work Lost locks recovery requires that any units of work that had been updating the data set at the time of the failure must complete before the data set can be made available for general use.
simultaneously all data sets in use when the lock structure fails, each data set can be restored to service individually as soon as all its sharing CICS regions have completed lost locks recovery. Connection failure to a coupling facility cache structure If connection to a coupling facility cache structure is lost, DFSMS™ attempts to rebuild the cache in a structure to which all the SMSVSAM servers have connectivity. If the rebuild is successful, the failure is transparent to CICS.
Recovery from the failure of a sysplex is just the equivalent of multiple MVS failure recoveries. Transaction abend processing If, during transaction abend processing, another abend occurs and CICS continues, there is a risk of a transaction abend loop and further processing of a resource that has lost integrity, because of uncompleted recovery. If CICS detects that this is the case, the CICS system abends with message DFHPC0402, DFHPC0405, DFHPC0408, or DFHPC0409.
The exit code then executes as an extension of the abending task, and runs at the same level as the program that issued the HANDLE ABEND command that activated the exit. After any program-level abend exit code has been executed, the next action depends on how the exit code ends: v If the exit code ends with an ABEND command, CICS gives control to the next higher level exit code that is active. If no exit code is active at higher logical levels, CICS terminates the task abnormally.
1. CICS invokes DFHREST only when RESTART(YES) is specified in a transaction’s resource definition. 2. Ensure that resources used by restartable transactions, such as files, temporary storage, and intrapartition transient data queues, are defined as recoverable. 3. When transaction restart occurs, a new task is attached that invokes the initial program of the transaction. This is true even if the task abended in the second or subsequent unit of work, and DFHREST requested a restart. 4.
v CICS remains operational, but the task currently in control terminates. v CICS terminates (see “Shutdown requested by the operating system” on page 29). If a program check occurs when a user task is processing, the task abends with an abend code of ASRA. If a program check occurs when a CICS system task is processing, CICS terminates. If an operating system abend has occurred, CICS searches the system recovery table, DFHSRT.
96 CICS TS for z/OS 4.
Chapter 9. Communication error processing The types of communication error that can occur include terminal error processing and intersystem communication failures. Terminal error processing There are two main CICS programs that participate in terminal error processing. These are the node error program, DFHZNEP, and the terminal error program, DFHTEP. CICS controls terminals by using VTAM (in conjunction with NCP for remote terminals).
The TEP is entered once for each terminal error, and therefore should be designed to process only one error for each invocation. Intersystem communication failures An intersystem communication failure can be caused by the failure of a CICS region, or the remote system to which it is connected. A network failure can also cause the loss of the connection between CICS and a remote system.
Part 3. Implementing recovery and restart This part describes the way you implement recovery and restart for CICS regions. © Copyright IBM Corp.
100 CICS TS for z/OS 4.
Chapter 10. Planning aspects of recovery When you are planning aspects of recovery, you must consider your applications, system definitions, internal documentation, and test plans. Application design considerations Think about recoverability as early as possible during the application design stages. This topic covers a number of aspects of design planning to consider. Questions relating to recovery requirements For ease of presentation, the following questions assume a single application.
Question 5: If a data set becomes unusable, should all applications be terminated while recovery is performed? If degraded service to any application must be preserved while recovery of the data set takes place, you will need to include procedures to do this. Question 6: Which of the files to be updated are to be regarded as vital? Identify any files that are so vital to the business that they must always be recoverable.
Before any design or programming work begins, all interested parties should agree on the statement—including: v Those responsible for business management v Those responsible for data management v Those who are to use the application—including the end users, and those responsible for computer and online system operation Designing the end user’s restart procedure Decide how the user is to restart work on the application after a system failure.
v If a user’s printer becomes unusable (because of hardware or communication problems), consider the use of alternatives, such as the computer center’s printer, as a standby. Security Decide the security procedures for an emergency restart or a break in communications. For example, when confidential data is at risk, specify that the users should sign on again and have their passwords rechecked.
and general log data to log streams defined to the MVS system logger. For more information, see Chapter 11, “Defining system and general log streams,” on page 107. Files For VSAM files defined to be accessed in RLS mode, define the recovery attributes in the ICF catalog, using IDCAMS. For VSAM files defined to be accessed in non-RLS mode, you can define the recovery attributes in the CSD file resource definition, or in the ICF catalog, providing your level of DFSMS supports this.
normal conditions. They should, nevertheless, be tested as far as possible, to ensure that they handle the functions for which they are designed. CICS facilities, such as the execution diagnostic facility (CEDF) and command interpreter (CECI), can help to create exception conditions and to interpret program and system reactions to those conditions.
Chapter 11. Defining system and general log streams All CICS system logging and journaling is controlled by the CICS log manager, which uses MVS system logger log streams to store its output. About this task CICS logging and journaling can be divided into four broad types of activity: System logging CICS maintains a system log to support transaction backout for recoverable resources. CICS implements system logging automatically, but you can define the log stream as DUMMY to inhibit this function.
System log streams These are used by the CICS log manager and the CICS recovery manager exclusively for unit of work recovery purposes. Each system log is unique to a CICS region, and must not be merged with any other system log. General log streams These are used by the CICS log manager for all other types of logging and journaling. You can merge forward recovery records, autojournal records, and user journal records onto the same general log stream, from the same, or from different, CICS regions.
CICS log manager connects to its log stream automatically during system initialization, unless it is defined as TYPE(DUMMY) in a CICS JOURNALMODEL resource definition. Although the CICS system log is logically a single logical log stream, it is written to two physical log streams—a primary and a secondary. In general, it is not necessary to distinguish between these, and most references are to the system log stream.
Model log streams for CICS system logs If CICS fails to connect to its system log streams because they have not been defined, CICS attempts to have them created dynamically using model log streams. To create a log stream dynamically, CICS must specify to the MVS system logger all the log stream attributes needed for a new log stream. To determine these otherwise unknown attributes, CICS requests the MVS system logger to create the log stream using attributes of an existing model log stream definition.
However, using model log streams defined with the CICS default name are always assigned to the same structure within an MVS image. This may not give you the best allocation in terms of recovery considerations if you are using structures defined across two or more coupling facilities. For example, consider a two-way sysplex that uses two coupling facilities, each with one log structure defined for use by CICS system logs, structures LOG_DFHLOG_001 and LOG_DFHLOG_002.
4-Way Sysplex LOG_DFHLOG_001 MVSA MVSB (on CF1) LOG_DFHLOG_002 MVSC MVSD (on CF2) Figure 9. Sharing system logger structures between 4 MVS images Varying the model log stream name: To balance log streams across log structures, using model log streams means customizing the model log stream names. You cannot achieve the distribution of log streams shown in this scenario using the CICS default model name.
work. With this information, CICS continues reading backwards, but this time reading only the records for units of work that are identified in the activity keypoint. Reading continues until CICS has read all the records for the units of work identified by the activity keypoint. This process means that completed units of work, including shunted backout-failed and commit-failed units of work, are ignored in this part of the log scan. This is significant for the retrieval of user-written log records.
v If a system log stream exceeds the primary storage space allocated, it spills onto secondary storage. (For a definition of primary and secondary storage, see the CICS Transaction Server for z/OS Installation Guide.) The resulting I/O can adversely affect system performance. v If the interval between activity keypoints is long, the volume of data could affect restart times.
Writing user-recovery data About this task You should write only recovery-related records to the system log stream. You can do this using the commands provided by the application programming interface (API) or the exit programming interfaces (XPI). This is important because user recovery records are presented to a global user exit program enabled at the XRCINPT exit point.
About this task The dddd value specifies the minimum number of days for which data is to be retained on the log. You are strongly recommended not to use the system log for records that need to be kept. Any log and journal data that needs to be preserved should be written to a general log stream. See the CICS System Definition Guide for advice on how to create general log stream data sets.
2. Define a general log stream for forward recovery data. If you do not define a general log stream, CICS attempts to create a log stream dynamically. See “Model log streams for CICS general logs” for details. 3. Decide how you want to merge forward recovery data from different CICS regions into one or more log streams. See “Merging data on shared general log streams” on page 118 for details.
Merging data on shared general log streams Unlike system log streams, which are unique to one CICS region, general log streams can be shared between many CICS regions. This means that you can merge forward recovery data from a number of CICS regions onto the same forward recovery log stream. About this task v You can use the same forward recovery log stream for more than one data set. You do not have to define a log stream for each forward-recoverable data set.
About this task The CICS-supplied group, DFHLGMOD, includes a JOURNALMODEL for the log of logs, called DFHLGLOG, which has a log stream name of &USERID..CICSVR.DFHLGLOG. Note that &USERID resolves to the CICS region userid, and if your CICS regions run under different RACF user IDs, the DFHLGLOG definition resolves to a unique log of logs log stream name for each region.
v In a format compatible with utility programs written for versions of CICS that use the log manager for logging and journaling. See the CICS Operations and Utilities Guide for more information about using the LOGR SSI to access log stream data, and for sample JCL. If you plan to write your own utility program to read log stream data, see the CICS Customization Guide for information about log stream record formats.
Operating a recovery process that is independent of time-stamps in the system log data ensures that CICS can restart successfully after an abnormal termination, even if the failure occurs shortly after local time has been put back. Offline utility program, DFHJUP Changing the local time forward has no effect on the processing of system log streams or general log streams by the CICS utility program, DFHJUP.
122 CICS TS for z/OS 4.
Chapter 12. Defining recoverability for CICS-managed resources This section describes what to do to ensure that you can recover the resources controlled by CICS on behalf of your application programs.
SPURGE({NO|YES}) This option indicates whether the transaction is initially system-purgeable; that is, whether CICS can purge the transaction as a result of: v Expiry of a deadlock timeout (DTIMOUT) delay interval v A CEMT, or EXEC CICS, SET TASK(id) PURGE|FORCEPURGE command. The default is SPURGE(NO). TPURGE({NO|YES}) This option indicates whether the transaction is system-purgeable in the event of a (non-VTAM) terminal error. The default is TPURGE(NO).
Recovery for files A CICS file is a logical view of a physical data set, defined to CICS in a file resource definition with an 8-character file name.
Forward recovery For VSAM files, you can use a forward recovery utility, such as CICSVR, when online backout processing has failed as a result of some physical damage to the data set. For forward recovery: v Create backup copies of data sets. v Record after-images of file changes in a forward recovery stream. CICS does this for you automatically if you specify that you want forward recovery support for the file.
| | | uses the ICF catalog entry recovery attributes instead of the FILE resource. To force CICS to use the FILE resource attributes instead of the catalog, set the NONRLSRECOV system initialization parameter to FILEDEF. v You define the recovery attributes for BDAM files in file entries in the file control table (FCT). VSAM files accessed in non-RLS mode You can specify support for both forward and backward recovery for VSAM files using the RECOVERY and FWDRECOVLOG options.
VSAM files accessed in RLS mode If you specify file definitions that open a data set in RLS mode, specify the recovery options in the ICF catalog. The recovery options on the CICS file resource definitions (RECOVERY, FWDRECOVLOG, and BACKUPTYPE) are ignored if the file definition specifies RLS access. The VSAM parameters LOG and LOGSTREAMID, on the access methods services DEFINE CLUSTER and ALTER commands, determine recoverability for the entire sphere.
INQUIRE DSNAME command returns values from the VSAM base cluster block (BCB). However, because base cluster block (BCB) recovery values are not set until the first open, if you issue an INQUIRE DSNAME command before the first file is opened, CICS returns NOTAPPLIC for RECOVSTATUS. BDAM files You can specify CICS support for backward recovery for BDAM files using the LOG parameter on the DFHFCT TYPE=FILE macro.
About this task If you use XFCNREC to suppress open failures that are a result of inconsistencies in the backout settings, CICS issues a message to warn you that the integrity of the data set can no longer be guaranteed. Any INQUIRE DSNAME RECOVSTATUS command that is issued from this point onward will return NOTRECOVABLE, regardless of the recovery attribute that CICS has previously enforced on the base cluster.
- File is defined with RECOVERY(ALL): the open fails. – Base cluster has RECOVERY(ALL): - File is defined with RECOVERY(NONE): the open fails. - File is defined with RECOVERY(BACKOUTONLY): the open fails. - File is defined with RECOVERY(ALL): the open proceeds unless FWDRECOVLOG specifies a different journal id from the base cluster, in which case the open fails. Any failure to open a file against a data set results in a message to the console. If necessary, the recovery options must be changed.
For more information about allocation and space requirements, see the CICS System Definition Guide.) For extrapartition transient data considerations, see “Recovery for extrapartition transient data” on page 134. You must specify the name of every intrapartition transient data queue that you want to be recoverable in the queue definition.
Physically recoverable TD queue (before failure) Item 1 Item 2 Item 3 Item 4 Next READ pointer CICS abends, causing UOW to fail in-flight Next WRITE pointer State of physically recoverable TD queue (after emergency restart) Item 1 Item 2 Item 3 Next READ pointer Item 4 CICS abends, causing UOW to fail in-flight Next WRITE pointer Figure 12. Illustration of recovery of a physically recoverable TD queue Making intrapartition TD physically recoverable can be useful in the case of some CICS queues.
Recovery for extrapartition transient data CICS does not recover extrapartition data sets. If you depend on extrapartition data, you will need to develop procedures to recover data for continued execution on restart following either a controlled or an uncontrolled shutdown of CICS.
Output extrapartition data sets The recovery of output extrapartition data sets is somewhat different from the recovery of input data sets. For a tape output data set, use a new output tape on restart. You can then use the previous output tape if you need to recover information recorded before termination. To avoid losing data in tape output buffers on termination, you can write unblocked records.
Define temporary storage queues as recoverable using temporary storage model resource definitions as shown in the following example define statements: CEDA DEFINE DESCRIPTION(Recoverable TS queues for START requests) TSMODEL(RECOV1) GROUP(TSRECOV) PREFIX(DF) LOCATION(AUXILIARY) RECOVERY(YES) CEDA DEFINE DESCRIPTION(Recoverable TS queues for BMS) TSMODEL(RECOV2) GROUP(TSRECOV) PREFIX(**) LOCATION(AUXILIARY) RECOVERY(YES) CEDA DEFINE DESCRIPTION(Recoverable TS queues for BMS) TSMODEL(RECOV3) GROUP(TSRECOV) PR
About this task CICS uses Business Transaction Services (BTS) to ensure that persistent messages are recovered in the event of a CICS system failure. For this to work correctly, follows these steps: Procedure 1. Use IDCAMS to define the local request queue and repository file to MVS. You must specify a suitable value for STRINGS for the file definition. The default value of 1 is unlikely to be sufficient, and you are recommended to use 10 instead. 2.
2. For each local request queue, define a QLOCAL object. Use the following command: DEFINE QLOCAL('queuename') DESCR('description') PROCESS(processname) INITQ('initiation_queue') TRIGGER TRIGTYPE(FIRST) TRIGDATA('default_target_service') BOTHRESH(nnn) BOQNAME('requeuename') where: v queuename is the local queue name. v processname is the name of the process instance that identifies the application started by the queue manager when a trigger event occurs. Specify the same name on each QLOCAL object.
not usable, message DFHPI0117 is issued, and CICS continues without BTS, using the existing channel-based container mechanism. If a CICS failure occurs before the Web service starts or completes processing, BTS recovery ensures that the process is rescheduled when CICS is restarted. If the Web service abends and backs out, the BTS process is marked complete with an ABENDED status. For request messages that require a response, a SOAP fault is returned to the Web service requester.
140 CICS TS for z/OS 4.
Chapter 13. Programming for recovery When you are designing your application programs, you can include recovery facilities that are provided by CICS; for example, you can use global user exits for backout recovery.
v Progress transaction, to check on progress through the application. Such a function could be used after a transaction failure or after an emergency restart, as well as at any time during normal operation. For example, it could be designed to find the correct restart point at which the terminal user should recommence the interrupted work. This would be particularly relevant in a pseudo-conversation.
SAA-compatible applications The resource recovery element of the Systems Application Architecture® (SAA) common programming interface (CPI) provides an alternative to the standard CICS application program interface (API) if you need to implement SAA-compatible applications. The resource recovery facilities provided by the CICS implementation of the SAA resource recovery interface are the same as those provided by CICS API.
committed in one unit of work, but the transaction is to continue with one or more units of work for further processing. 3. Where file or database updates must be kept in step, make sure that your application does them in the same unit of work. This approach ensures that those updates will all be committed together or, in the event of the unit of work being interrupted, the updates will back out together to a consistent state.
back out only the updates made during that individual step; the application is responsible for restarting at the appropriate point in the conversation, which might involve recreating a screen format. However, other tasks might try to update the database between the time when update information is accepted and the time when it is applied to the database. Design your application to ensure that no other application can update the database at a time when it would corrupt the updating by your own application.
v Data tables (user-maintained) v Coupling facility data tables CICS can return all these resources to their status at the beginning of an in-flight unit of work if a task ends abnormally. Temporary storage (auxiliary) You can use a temporary storage item to communicate between transactions. (For this purpose, the temporary storage item needs to be unique to the terminal ID. If the terminal becomes unavailable, the transaction sequence is interrupted until the terminal is again available.
Procedure v Arrange for all transactions to access files in a sequence agreed in advance. This could be a suitable subject for installation standards. Be extra careful if you allow updates through multiple paths. v Enforce explicit installation enqueueing standards so that all applications do the following: 1. Enqueue by the same character string 2. Use those strings in the same sequence. v Always access records within a file in the same sequence.
The abend processing should analyze the cause of failure as far as possible, and restart the task if appropriate. Ensure that either the user or the master terminal operator can take appropriate action to repeat the updates. You could, for example, allow the user to reinitiate the task. An alternative solution is for the started transaction to issue a START command specifying its own TRANSID. Immediately before issuing the RETURN command, the transaction should cancel the START command.
About this task Such queuing can be done on a transient data queue associated with a terminal. A special transaction, triggered when the terminal is available, can then format and present the data. For recovery and restart purposes: v The transient data queue should be specified as logically recoverable. v If the transaction that presents the data fails, dynamic transaction backout will be called.
For example, if file input and output errors occur (where the default action is merely to abend the task), you might want to inform the master terminal operator, who can decide to terminate CICS, especially if one of the files is critical to the application. Your installation might have standards relating to the use of RESP options or HANDLE CONDITION commands. Review these for each new application.
v DTB takes place only after program level abend exits (if any) have attempted cleanup or logical recovery. Transaction restart after DTB For each transaction where DTB is specified, consider also specifying automatic transaction restart. For example, for transactions that access DL/I databases (and are subject to program isolation deadlock), automatic transaction restart is usually specified.
v Send a message to the terminal operator if, for example, you believe that the abend is due to bad input data.
START TRANSID commands In a transaction that uses the START TRANSID command to start other transactions, you must maintain logical data integrity. You can maintain data integrity by following these guidelines: 1. Always use the PROTECT option of the START TRANSID command. This ensures that if the START-issuing task is backed out, the new task will not start. 2.
About this task There are two forms of locking: 1. The implicit locking functions performed by CICS (or the access method) whenever your transactions issue a request to change data. These are described under: v “Implicit locking for files” v “Implicit enqueuing on logically recoverable TD destinations” on page 157 v “Implicit enqueuing on recoverable temporary storage queues” on page 157 v “Implicit enqueuing on DL/I databases with DBCTL” on page 158. 2.
READ UPDATE WRITE ====== Locking ===== during update (See Note below) Task A SOT READ UPDATE WRITE ===Wait=== Task B SOT Abbreviations: SOT: Start of task SP: Syncpoint Note: SP ===== Locking ==== during update (See caption below) SP For BDAM and VSAM non-RLS, this locking grants exclusive control. Figure 13. Locking during updates to nonrecoverable files. This figure illustrates two tasks updating the same record or control interval.
The extended period of locking is needed to avoid an update committed by one task being backed out by another. (Consider what could happen if the nonextended locking action shown in Figure 13 on page 155 was used when updating a recoverable file. If task A abends just after task B has reached syncpoint and has thus committed its changes, the subsequent backout of task A returns the file to the state it was in at the beginning of task A, and task B’s committed update is lost.
The backout fails because a duplicate key is detected in the AIX indicated by message DFHFC4701, with a failure code of X'F0'. There is no locking on the AIX® key to prevent the second task taking the key before the end of the first task’s unit of work. If there is an application requirement for this sort of operation, you should use the CICS enqueue mechanism to reserve the key until the end of the unit of work. 5.
enqueuing on temporary storage queues where concurrently executing tasks can read and change queue(s) with the same temporary storage identifier. (See “Explicit enqueuing (by the application programmer).
After a task has issued an ENQ RESOURCE(data-area) command, any other task that issues an ENQ RESOURCE command with the same data-area parameter is suspended until the task issues a matching DEQ RESOURCE(data-area) command, or until the unit of work ends. Note: Enqueueing on more than one resource concurrently might create a deadlock between transactions.
v If both deadlocked resources are CICS resources (but not both VSAM resources), or one is CICS and the other DL/I, CICS abends the task whose DTIMOUT period elapses first. It is possible for both tasks to time out simultaneously. If neither task has a DTIMOUT period specified, they both remain suspended indefinitely, unless one of them is canceled by a master terminal command. The abended task may then be backed out by dynamic transaction backout, as described in “Transaction backout” on page 74.
Procedure v Enable them in PLT programs in the first part of PLT processing. v Specify them on the system initialization parameter, TBEXITS. This takes the form TBEXITS=(name1,name2,name3,name4,name5,name6), where name1, name2, name3, name4, name5, and name6 are the names of your global user exit programs for the XRCINIT, XRCINPT, XFCBFAIL, XFCLDEL, XFCBOVER, XFCBOUT exit points.
XFCLDEL global user exit XFCLDEL is invoked when backing out a unit of work that performed a write operation to a VSAM ESDS, or a BDAM data set. XFCBOVER global user exit XFCBOVER is invoked whenever CICS is about to decide not to backout an uncommitted update, because the record could have been updated by a non-RLS batch program. This situation can occur after a batch program has opened a data set, even though it has retained locks, by overriding the RLS data set protection.
Chapter 14. Using a program error program (PEP) The program error program (PEP) gains control after all program-level ABEND exit code has executed and after dynamic transaction backout has been performed. About this task There is only one program error program for the whole region. Procedure 1. Decide whether you want to use the CICS-supplied program error program, DFHPEP or create your own.
7. The CICS transaction failure program, DFHTFP, links to DFHPEP before transaction backout is performed. This means resources used by the abending transaction may not have been released. DFHPEP needs to be aware of this, and might need logic to handle resources that are still locked. 8. Do not use the restart function for distributed transactions whose principal facilities are APPC links.
When you have corrected the error, you can re-enable the relevant installed transaction definition to allow terminals to use it. You can also disable transaction identifiers when transactions are not to be accepted for application-dependent reasons, and can enable them again later. The CICS Resource Definition Guide tells you more about the master terminal operator functions.
166 CICS TS for z/OS 4.
Chapter 15. Resolving retained locks on recoverable resources This section describes how you can locate and resolve retained locks that are preventing access to resources, either by CICS transactions or by batch jobs. About this task Although the main emphasis in this section is on how you can switch from RLS to non-RLS access mode in preparation for batch operations involving the use of RLS-accessed data sets, it is also of interest for CICS online operation.
The RLS quiesce and unquiesce functions The RLS quiesce and unquiesce functions are initiated by a CICS command in one region, and propagated by the VSAM RLS quiesce interface to other CICS regions in the sysplex. When these functions are complete, the ICF catalog shows the quiesce state of the target data set.
MVS1 CICS AOR1 User transaction issues: EXEC CICS SET DSNAME(ds_name) 1 QUIESCED 2 CICS task CFQS MVS2 CICS AOR2 CICS task CFQR 5 CICS task CFQR 6 5 CICS RLS quiesce exit 3 6 CICS RLS quiesce exit 8 4 4 SMSVSAM SMSVSAM 7 4a MVS Coupling facility 4a ICF catalog Figure 16. The CICS RLS quiesce operation with the CICS quiesce exit program Note: 1. A suitably-authorized user application program (AOR1 in the diagram) issues an EXEC CICS SET DSNAME(...
5. 6. 7. 8. (4a) SMSVSAM uses the coupling facility to propagate the request to the other SMSVSAM servers in the sysplex. The CICS RLS quiesce exit program schedules a CICS region task (CFQR) to perform asynchronously the required quiesce actions in that CICS region. When CICS has closed all open RLS ACBs for the data set, CICS issues the “quiesce completed” notification (the IDAQUIES macro QUICMP function) direct to SMSVSAM through the control ACB interface.
With the new RLS quiesce mechanism, you do not have to close a data set to take a non-BWO backup. However, because this causes new transactions to be abended, you may prefer to quiesce your data sets before taking a non-BWO backup. Non-BWO data set backup end A quiesce interface function initiated by DFSMSdss at the end of non-BWO backup processing (or to cancel a non-BWO backup request).
Lost locks recovery complete A quiesce interface function initiated by VSAM. VSAM takes action associated with a sphere having completed lost locks recovery on all CICS regions that were sharing the data set. SMSVSAM invokes the CICS RLS quiesce exit program in each region that is registered with an SMSVSAM control ACB. Until lost locks recovery is complete, CICS disallows any new requests to the data set (that is, only requests issued as part of the recovery processing are possible).
Note: If your file definitions specify an LSR pool id that is built dynamically by CICS, consider using the RLSTOLSR system initialization parameter. v Open the files non-RLS read-only mode in CICS. v Concurrently, run batch non-RLS. v When batch work is finished: – Close the read-only non-RLS mode files in CICS. – Re-define the files as RLS mode and with update operations. You can do this using the CEMT, or EXEC CICS, SET FILE command. – Unquiesce the data sets.
The remainder of this topic on switching to non-RLS access mode describes the options that are available if you need to switch to non-RLS mode and are prevented from doing so by retained locks. Resolving retained locks before opening data sets in non-RLS mode VSAM sets an ‘RLS-in-use’ indicator in the ICF catalog cluster entry when a data set is successfully opened in RLS mode.
About this task However, it does know about the uncommitted changes that are protected by such locks, and why the changes have not yet been committed successfully. CICS uses this information to help you resolve any retained locks that are preventing you from switching to non-RLS access mode. INQUIRE DSNAME You can use the INQUIRE DSNAME command to find out whether the data set has any retained locks or if it is waiting for lost locks recovery by any CICS region. INQUIRE DSNAME(...
v Commit failure, where a unit of work has failed during the commit action. The commit action may be either to commit the changes made by a completed unit of work, or to commit the successful backout of a unit of work. This failure is caused by a failure of the SMSVSAM server, which is returned as RLSSERVER on the CAUSE option and COMMITFAIL or RRCOMMITFAIL on the REASON option of the INQUIRE UOWDSNFAIL command. (RLSSERVER is also returned as the WAITCAUSE on an INQUIRE UOW command.
4. If a unit of work has been shunted with a different CAUSE and REASON, review the descriptions of these values in the INQUIRE UOWDSNFAIL command to determine what action to take to allow the shunted unit of work to complete. Choosing data availability over data integrity There may be times when you cannot resolve all the retained locks correctly, either because you cannot easily remedy the situations preventing the changes from being committed, or because of insufficient time.
Diagnostic messages DFHFC3003 and DFHFC3010 are issued for each log record. If a data set has both indoubt-failed and other (backout- or commit-) failed units of work, deal with the indoubt UOWs first, using SET DSNAME UOWACTION, because this might result in other failures which can then be cleared by the SET DSNAME RESETLOCKS command. The batch-enabling sample programs CICS provides a suite of sample programs to help you automate batch preparation procedures.
CEMT INQUIRE UOWDSNFAIL DSN(’RLS.ACCOUNTS.ESDS.DBASE1’) STATUS: RESULTS Dsn(RLS.ACCOUNTS.ESDS.DBASE1 ) Dat Del Uow(AA6DB080C40CEE01) Rls Dsn(RLS.ACCOUNTS.ESDS.DBASE1 ) Dat Ind Uow(AA6DB08AC66B4000) Rls The display shows a REASON code of DELEXITERROR (Del) for one unit of work, and INDEXRECFULL (Ind) for the other. 2. A CEMT SET DSNAME(...
X’AA6DB08AC66B4000’ and file ACCNT1 . Update was a write-add made by transaction WKLY at terminal T583 under task number 00027. Key length 4, data length 7, base ESDS RBA X’00000DDF’, record key X’00000DDF’ A special case: lost locks If a lost locks condition occurs, any affected data set remains in a lost locks state until all CICS regions have completed lost locks recovery for the data set.
v Do not use DENYNONRLSUPDATE if you run non-RLS work after specifying PERMITNONRLSUPDATE. The permit status is automatically reset by the CICS regions that hold retained locks when they open the data set in RLS mode. Post-batch processing After a non-RLS program has been permitted to override retained locks, the uncommitted changes that were protected by those locks must not normally be allowed to back out. This is because the non-RLS program may have changed the protected records.
Coupling facility data table retained locks Recoverable coupling facility data table records can be the subject of retained locks, like any other recoverable CICS resource that is updated in a unit of work that subsequently fails. A recoverable CFDT supports indoubt and backout failures. If a unit of work fails when backing out an update to a CFDT, or if it fails indoubt during syncpoint processing, the locks are converted to retained locks and the unit of work is shunted.
Chapter 16. Moving recoverable data sets that have retained locks There may be times when you need to re-define a VSAM data set by creating a new data set and moving the data from the old data set to the new data set. About this task For example, you might need to do this to make a data set larger. In this case, note that special action is needed in the case of data sets that are accessed in RLS mode by CICS regions. Recoverable data sets may have retained locks associated with them.
The following access method services examples assume that CICS.DATASET.A needs to be redefined and the data moved to a data set named CICS.DATASET.B, which is then renamed: DEFINE CLUSTER (NAME(CICS.DATASET.B) ... REPRO INDATASET(CICS.DATASET.A) OUTDATASET(CICS.DATASET.B) DELETE CICS.DATASET.A ALTER CICS.DATASET.B NEWNAME(CICS.DATASET.A) If the recoverable data set has associated RLS locks, these steps are not sufficient because: v The REPRO command copies the data from CICS.DATASET.A to CICS.DATASET.
This makes the data set unavailable while the move from old to new is in progress, and also allows the following unbind operation to succeed. 4. Issue the SHCDS FRUNBIND subcommand to unbind any retained locks against the old data set. For example: SHCDS FRUNBIND(CICS.DATASET.A) This enables SMSVSAM to preserve the locks ready for rebinding later to the new data set.
v Create a new empty data set into which the copy is to be restored, and use IMPORT to copy the data from the exported version of the data set to the new empty data set. v v v v v v v Use SHCDS FRSETRR to mark the original data set as being under maintenance. Use SHCDS FRUNBIND to unbind the locks from the original data set. Use SHCDS FRSETRR to mark the new data set as being under maintenance. Delete the original data set. Rename the new data set back to the old name.
Chapter 17. Forward recovery procedures If a data set that is being used by CICS fails, perhaps because of physical damage to a disk, you can recover the data by performing forward recovery of the data set. About this task Your forward recovery procedures can be based either on your own, or an ISV-supplied utility program for processing the relevant CICS forward recovery log streams, or you can use CICS VSAM Recovery. See CICS VSAM Recovery for z/OS for details of forward recovery using CICS VR.
Recovery of data set with volume still available The procedure described here is necessary to preserve any retained locks that are held by SMSVSAM against the data in the old data set. Unless you follow all the steps of this procedure, the locks will not be valid for the new data set, with potential loss of data integrity. The following steps outline the procedure to forward recover a data set accessed in RLS mode.
9. Alter the new data set name Use access method services to rename the new data set to the name of the old data set. ALTER CICS.DATASETB NEWNAME(CICS.DATASETA) You must give the restored data set the name of the old data set to enable the following bind operation to succeed. 10. Issue the FRBIND subcommand Use this access method services SHCDS subcommand to re-bind to the recovered data set all the retained locks that were unbound from the old data set. 11.
There are several methods you can use to recover data sets after the loss of a volume. Whichever method you use (whether a volume restore, a logical data set recovery, or a combination of both), you need to ensure SMSVSAM puts data sets into a lost locks state to protect data integrity. This means that, after you have carried out the initial step of recovering the volume, your data recovery process must include the following command sequence: 1. ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVER 2.
This is because CICS cannot run the lost locks recovery process until the data sets are available, and the data sets are made available only after the CICS VR recovery jobs are finished. If you physically restore the volume, however, the data sets that need to be forward recovered are immediately available for backout. In this case you must use CFVOL QUIESCE before the volume restore to prevent access to the restored volume until that protection can be transferred to CICS (by using the CICS SET DSNAME(...
This clears the SMSVSAM CFVOL-QUIESCED state and allows SMSVSAM RLS access to the volume. CICS ensures that access is not allowed to the data sets that will eventually be forward recovered, but the volume is available for other data sets. 6. Run data set forward recovery jobs.
PIDS/565501800 LVLS/510 MS/DFHFC0152 RIDS/DFHFCCA PTFS/UN92873 REGS/GR15 VALU/00000008 PCSS/IDARETLK PRCS/000000A9 +DFHFC0312 ADSWA03A Message DFHFC0152 data set RLSADSW.VF04D.DATAENDB We used the CEMT command INQUIRE UOWDSNFAIL IOERROR to display the UOWS that were shunted as a result of the I/O errors. For example, on the CICS region ADSWA01D the command showed the following shunted UOWs: INQUIRE UOWDSNFAIL IOERROR STATUS: RESULTS Dsn(RLSADSW.VF04D.TELLCTRL Uow(ADD18C2DA4D5FC03) Dsn(RLSADSW.VF04D.
effect in CICS region ADSWA03C was shown by the following response to an INQUIRE UOWDSNFAIL command for data set RLSADSW.VF01D.BANKACCT: INQUIRE UOWDSNFAIL DSN(RLSADSW.VF01D.BANKACCT) STATUS: RESULTS Dsn(RLSADSW.VF01D.BANKACCT Uow(ADD19B8166268E02) Dsn(RLSADSW.VF01D.BANKACCT Uow(ADD19B9D93DE1200) ) Dat Ope Rls ) Rls Com Rls After the SMSVSAM servers terminated, all RLS-mode files were automatically closed by CICS and further RLS access prevented. 6.
work. Assuming that all CICS regions are active, and there are no indoubt UOWs, lost locks processing, for all data sets except the ones on the failed volume, should complete quickly. 9. In this example, CEMT INQUIRE UOWDSNFAIL on CICS region ADSWA01D showed UOW failures only for the RLSADSW.VF04D.TELLCTRL and RLSADSW.VF04D.DATAENDB data sets: INQUIRE UOWDSNFAIL STATUS: RESULTS Dsn(RLSADSW.VF04D.TELLCTRL Uow(ADD18C2DA4D5FC03) Dsn(RLSADSW.VF04D.
waits for indoubt resolution before allowing general access to the data set. In such a situation you can still release the locks immediately, using the SET DSNAME command, although in most cases you will lose data integrity. See “Lost locks recovery” on page 89 for more information about resolving indoubt UOWs following lost locks processing.
ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVER 8. When all SMSVSAM servers were down, we deleted the IGWLOCK00 lock structure with the MVS command: VARY SMS,SMSVSAM,FORCEDELETELOCKSTRUCTURE 9. We restarted the SMSVSAM servers with the MVS command: ROUTE *ALL,VARY SMS,SMSVSAM,ACTIVE CICS was informed during dynamic RLS restart about the data sets for which it must perform lost locks recovery.
that before running SHCDS CFREPAIR, the restored user catalog must be import connected to the master catalog on all systems (see the “Recovering Shared Catalogs” topic in DFSMS/MVS Managing Catalogs). Forward recovery of data sets accessed in non-RLS mode For data sets accessed in non-RLS mode, use the following forward recovery procedure: 1. Close all files Close all the files that are open against the failed data set, by issuing CEMT, or EXEC CICS, SET FILE(...
In these cases, you can resolve the cause of the failure and try the whole process again. This topic describes what to do when the failure in forward recovery cannot be resolved. In this case, where you are unsuccessful in applying all the forward recovery log data to a restored backup, you are forced to abandon the forward recovery, and revert to your most recent full backup.
1) Force shunted indoubt units of work using SET DSNAME(...) UOWACTION(COMMIT | BACKOUT | FORCE). Before issuing the next command, wait until the SET DSNAME(...) UOWACTION has completed against all shunted indoubt units of work. If the UOWACTION command for an indoubt unit of work results in backout, this will fail for the data set that is being restored, because it is still in a recovery-required state.
Procedure for failed non-RLS mode forward recovery operation If you are not successful in applying all the forward recovery log data to a restored backup, you are forced to abandon the forward recovery, and revert to your most recent full backup.
202 CICS TS for z/OS 4.
Chapter 18. Backup-while-open (BWO) The BWO facility, together with other system facilities and products, allows you to take a backup copy of a VSAM data set while it remains open for update. Many CICS applications depend on their data sets being open for update over a long period of time. Normally, you cannot take a backup of the data set while the data set is open.
forward-recovery logs. Long-running transactions, automated teller machines, and continuously available applications require the database to be up and running when the backup is being taken. The concurrent copy function used along with BWO by DFSMSdss allows backups to be taken with integrity even when control-area and control-interval splits and data set additions (new extents or add-to-end) are occurring for VSAM key sequenced data sets.
Hardware requirements The concurrent copy function is supported by the IBM® 3990 Model 3 with the extended platform and the IBM 3990 Model 6 control units. Which data sets are eligible for BWO You can use BWO only for: v Data sets that are on SMS-managed storage and that have an integrated catalog facility (ICF) catalog. v VSAM data sets accessed by CICS file control and for the CICS system definition (CSD) file. ESDS, KSDS, and RRDS are supported.
How you request BWO You can define files as eligible for BWO in one of two ways. Procedure Decide which method you want to use for data sets: v If your data set is accessed in RLS mode, you must define the BWO option in the ICF catalog. Defining BWO in the ICF catalog requires DFSMS 1.3. v If your data set is accessed only in non-RLS mode, you can define the BWO option in either the ICF catalog or the CICS file definition. Defining BWO in the ICF catalog requires DFSMS 1.3.
v But if you specify BWO(TYPECICS), and the PTF has not been applied, and you have not specified LOG(ALL) and a forward recovery log stream name, BWO processing for RLS remains disabled for such files. To achieve BWO for the file, you must either: – apply the PTF, – or specify LOG(ALL) and a forward recovery log stream name (if those actions are appropriate for the file in question).
Removing BWO attributes If you want to remove BWO attributes from your data sets, you must follow the correct procedure to avoid problems when taking subsequent back ups. Procedure 1. Close the VSAM data set either by shutting down CICS normally or issuing the command CEMT SET FILE CLOSED. Do not perform an immediate shutdown, as CICS does not close the files and the status of BWO does not reset. The BWO status of your data sets will not be correct when you restart CICS. 2.
After an uncontrolled or immediate shutdown, further BWO backups might be taken by DFSMShsm, because the BWO status in the ICF catalog is not reset. These backups should be discarded; only the non-BWO backups taken at the end of the batch window should be used during forward recovery, together with the CICS forward recovery logs.
Each of these operations is discussed in the following sections. File opening Different processing is done for each of the three cases when a file is opened for an update.
v If the file was defined with BACKUPTYPE(STATIC) and the ICF catalog indicates that the data set is already ineligible for BWO, CICS sets the BACKUPTYPE attribute in the DSNB to indicate ineligibility for BWO. However, if the ICF catalog indicates that the data set is currently eligible for BWO, IGWABWO makes it ineligible for BWO and sets the recovery point to the current time. CICS issues a message, and you can discard any BWO backups already taken in a previous batch window.
v If the file was defined with BACKUPTYPE(STATIC) and the ICF catalog indicates that the data set is already ineligible for BWO, the ICF catalog is not updated. However, if the ICF catalog indicates that the data set is currently eligible for BWO, IGWABWO makes it ineligible for BWO and sets the recovery point to the current time. CICS issues an attention message; you should discard any BWO backup copies already taken in a previous batch window.
Shutdown and restart The way CICS closes files is determined by whether the shutdown is controlled, immediate, or uncontrolled. Controlled shutdown During a controlled shutdown, CICS closes all open files defined in the FCT. This ensures that, for files that are open for update and eligible for BWO, the BWO attributes in the ICF catalog are set to a ‘BWO disabled’ state If a failure occurs during shutdown so that CICS is unable to close a file, CICS issues warning message DFHFC5804.
When you use DFSMShsm, you still use DFSMSdss as the data mover. You can specify this using the DFSMShsm SETSYS command: SETSYS DATAMOVER(DSS) The DFSMS processing at the start of backup is dependent on the DFSMS release level. For releases before DFSMS 1.2, DFSMSdss first checks the BWO attributes in the ICF catalog to see if the data set is eligible for BWO. If it is, the backup is made without attempting to obtain exclusive control and serialize updates to this data set. For DFSMS 1.
DFSMSdfp must now disallow the pending change to ‘BWO enabled’ (and DFSMSdss must fail the backup) because, if the split did not finish before the end of the backup, the invalid backup would not be discarded. v From ‘BWO disabled and VSAM split occurred’ to ‘BWO enabled’. This state change could be attempted if: 1. At the start of data set backup processing, a request is issued to change the ‘BWO enabled and VSAM split occurred’ state to the ‘BWO enabled’ state. 2.
each CICS allows all units of work with updates for the data set to complete, and then they write the tie-up records to the forward recovery log and the log of logs, and replies to DFSMSdss. For BWO backups, it is usually not necessary for the forward recovery utility to process a log from file-open time. Therefore, the tie-up records for all open files are written regularly on the log during activity-keypoint processing, and the time that they are written is recorded.
The forward recovery utility should ALLOCATE, with DISP=OLD, the data set that is to be recovered. This prevents other jobs accessing a back level data set and ensures that data managers such as CICS are not still using the data set. Before the data set is opened, the forward recovery utility should set the BWO attribute flags to the ‘Forward recovery started but not ended’ state. This prevents DFSMShsm taking BWO backups while forward recovery is in progress.
An assembler program that calls DFSMS callable services *ASM XOPTS(CICS,NOEPILOG,SP) * * A program that can be run as a CICS transaction to Read and Set * the BWO Indicators and BWO Recovery Point via DFSMS Callable * Services (IGWABWO). * * Invoke the program via a CICS transaction as follows: * * Rxxx ’data_set_name’ * Sxxx 100 ’data_set_name’ * * Where: * Rxxx and Sxxx are the names of the transactions that will invoke * this program. Specify Rxxx to read and Sxxx to set the BWO * attributes.
DS 30C DS 8C DS 8C DS 8C DS C DS 0CL11 DS 7C BWOVAL1 DS C BWOVAL2 DS C BWOVAL3 DS C DS C * DATETIME DS D * RECOVPT DS 0D DTZERO DS B DTCENTRY DS B DTDATE DS 5B DTSIGN1 DS B * DTTIME DS 6B DTTENTHS DS B DTSIGN2 DS B * RECOVPTP DS 0D DATEPACK DS F TIMEPACK DS F * DFHREGS PROG CSECT PROG AMODE 31 * DATEVAL SUCMSG1 TIMEVAL SUCMSG2 READMSG Date value from BWO recovery point Message text Time value from BWO recovery point Message text If function = READ put out BWO flags Message text BWO indicator 1 BWO indicato
MVC BWOFLAGS(12),ZEROES LA R4,1(0) CLI BWOC1,C’0’ BE PRGBIT2 ST R4,BWOF1 PRGBIT2 DS 0H CLI BWOC2,C’0’ BE PRGBIT3 ST R4,BWOF2 PRGBIT3 DS 0H CLI BWOC3,C’0’ BE PRGCONT ST R4,BWOF3 B PRGCONT PRGREAD DS 0H CLI TRANFUNC,C’R’ BNE PRGABORT * Set BWO indicator 1 if required Set BWO indicator 2 if required Set BWO indicator 3 if required If tran id not R or S then abort * Set up the parameters for a read call * SR R4,R4 LA R4,READ(0) ST R4,FUNC Set function MVC DSN(44),DSNAMER Set data set name LH R4,INLENGTH S
MVC MVC UNPK TR MVC UNPK TR MVC CLI BNE SUCMSG1(8),SUCTXT1 SUCMSG2(1),SUCTXT2 KEYWORK(9),BWOTIME(5) Make date printable KEYWORK(8),HEXTAB-C’0’ DATEVAL(8),KEYWORK KEYWORK(9),BWOTIME+4(5) Make time printable KEYWORK(8),HEXTAB-C’0’ TIMEVAL(8),KEYWORK TRANFUNC,C’S’ If READ then print BWO flags PRGREADO * * Got all the info we need, so put it out and exit * EXEC CICS SEND TEXT FROM(SUCMSG) LENGTH(55) ERASE WAIT * B PRGEXIT * * It’s a read so we also need the BWO flags for output * PRGREADO DS 0H MVC READMSG(11
222 CICS TS for z/OS 4.
Chapter 19. Disaster recovery If your CICS system is normally available about 99 percent of the time, it would be wise to look at your disaster recovery plan. The same pressure that drives high availability drives the need for timely and current disaster recovery. You must plan what level of disaster recovery you require for your CICS environment.
acceptable. If you are located in an area prone to hurricanes or earthquakes, for example, a disaster recovery site next door would be pointless. When you are planning for disaster recovery, consider the cost of being unable to operate your business for a period of time. You have to consider the number of lost transactions, and the future loss of business as your customers go elsewhere.
v How critical and sensitive your business processes are: the more critical they are, the more frequently testing may be required. Six tiers of solutions for off-site recovery One blueprint for recovery planning describes a scheme consisting of six tiers of off-site recoverability (tiers 1-6), with a seventh tier (tier 0) that relies on local recovery only, with no off-site backup.
Approach Recovery Backups kept offsite Procedures and inventory offsite Recovery - install required hardware, restore system and data, reconnect to network Relatively low cost Difficult to manage Most important applications resolved first Recovery possible, but may take a long time Figure 18. Disaster recovery tier 1: physical removal Your disaster recovery plan has to include information to guide the staff responsible for recovering your system, from hardware requirements to day-to-day operations.
Tier 1 Tier 1 provides a very basic level of disaster recovery. You will lose data in the disaster, perhaps a considerable amount. However, tier 1 allows you to recover and provide some form of service at low cost. You must assess whether the loss of data and the time taken to restore a service will prevent your company from continuing in business. Tier 2 - physical removal with hot site Tier 2, like tier 1, provides a very basic level of disaster recovery.
The advantage of tier 3 is that you should be able to provide a service to your users quite rapidly. You must assess whether the loss of data will prevent your company from continuing in business. Figure 20 summarizes the tier 3 solution. Standby Site Approach Recovery Backups kept off-site Procedures and inventory off-site Recovery - restore system and data, reconnect to network Standby site, plus bulk data transfer costs Recovery in hours Figure 20.
Tier 0 - 3 1 No offsite data 2 3 4 Electronic vaulting Truck access method tapes to cold site Truck access method and hot site (none) (week +) (1 day +) (< 1 day) DFSMSdss DFSMSdss DFSMShsm (ABARS) DFSMSdss DFSMShsm (ABARS) ESCON Concurrent copy SMS tape Figure 21. Disaster recovery tier 0-3: summary of solutions The advantage of these methods is their low cost. The disadvantages of these methods are: v Recovery is slow, and it can take days or weeks to recover.
Site One Site Two VTAM 3745 3745 Channel extender 3990 Approach ESCON 3990 Recovery Workload may be shared Site one backs up site two and the reverse Critical applications and data are online Switch network Recover other applications Continuous transmission of data Dual online for critical data Network switching capability Recovery in minutes to hours Figure 22.
v Cost is higher than for the tier 1 to 3 solutions, because you need dedicated hardware, software, and communication links. Tier 5 - two-site, two-phase commit A tier 5 solution is appropriate for a custom-designed recovery plan with special applications. Because these applications must be designed to use this solution, it cannot be implemented at most CICS sites. Figure 23 summarizes the tier 5 solution.
Figure 24 summarizes the tier 6 solution. Site One Site Two VTAM 3745 3745 Channel extender 3990 Approach ESCON 3990 Recovery Local and remote copies updated Dual online storage Network switching capability Most expensive Instantaneous recovery Non-disruptive terminal switch Figure 24. Disaster recovery tier 6: minimal to zero data loss Tier 6, minimal to zero data loss, is the ultimate level of disaster recovery. There are two tier 6 solutions, one hardware-based and the other software-based.
support the XRC DFSMS/MVS host, and one for the recovery 3990, this allows a total of 86 km (53.4 miles) between the 3990s. If you use channel extenders with XRC, there is no limit on the distance between your primary and remote site. For RRDF there is no limit to the distance between the primary and secondary sites. Tier 4–6 solutions This summary shows the three tiers and the various tools for each that can help you reach your required level of disaster recovery.
Disaster recovery and high availability This topic describes the tier 6 solutions for high availability and data currency when recovering from a disaster. Peer-to-peer remote copy (PPRC) and extended remote copy (XRC) PPRC and XRC are both 3990-6 hardware solutions that provide data currency to secondary, remote volumes. Updates made to secondary DASD are kept in time sequence. This ensures that updates are applied consistently across volumes.
v v v v v IMS write-ahead data set (WADS) and IMS online log data set (OLDS) ACBLIB for IMS Boot-strap data set (BSDS), the catalog and the directory for DB2 DB2 logs Any essential non-database volumes CICS applications can use non-DASD storage for processing data. If your application depends on this type of data, be aware that PPRC and XRC do not handle it. For more information on PPRC and XRC, see Planning for IBM Remote Copy, SG24-2595-00, and DFSMS/MVS Remote Copy Administrator's Guide and Reference.
where there is a high volume of transactions, but each transaction is typically less than 200 dollars in value. Other benefits of PPRC and XRC PPRC or XRC may eliminate the need for disaster recovery backups to be taken at the primary site, or to be taken at all. PPRC allows you to temporarily suspend the copying of updates to the secondary site. This allows you to suspend updates at the secondary site so that you can make image copies or backups of the data there.
between the primary and secondary sites is interrupted. Remote logging is only as effective as the currency of the data that is sent off-site. RRDF transports log stream data to a remote location in real-time, within seconds of the log operation at the primary site. When the RRDF address space at the remote site receives the log data, it formats it into archived log data sets. Once data has been stored at the remote site, you can use it as needed to meet business requirements.
You should ensure that a senior manager is designated as the disaster recovery manager. The recovery manager must make the final decision whether to switch to a remote site, or to try to rebuild the local system (this is especially true if you have adopted a solution that does not have a warm or hot standby site). You must decide who will run the remote site, especially during the early hours of the disaster.
CICS VSAM Recovery QSAM copy CICS VSAM Recovery (CICS VR) provides a QSAM copy function that can copy MVS log streams to a QSAM data set. Copies of the QSAM data can be sent either electronically or physically to the remote site. On arrival at the remote site, you can use the MVS system logger import services to put the log records into an MVS system logger log stream. Alternatively, you can use CICS VR to perform forward recovery of a data set using the QSAM data directly.
If a disaster occurs at the primary site, your disaster recovery procedures should include recovery of VSAM data sets at the designated remote recovery site. You can then emergency restart the CICS regions at the remote site so that they can backout any uncommitted data. Special support is needed for RLS because record locks, which were protecting uncommitted data from being updated by other transactions at the primary site, are not present at the remote site.
Part 4. Appendixes © Copyright IBM Corp.
242 CICS TS for z/OS 4.
Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used.
Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Programming License Agreement, or any equivalent agreement between us. Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp.
Bibliography CICS books for CICS Transaction Server for z/OS General CICS CICS CICS CICS CICS CICS Transaction Transaction Transaction Transaction Transaction Transaction Server Server Server Server Server Server for for for for for for z/OS z/OS z/OS z/OS z/OS z/OS Program Directory, GI13-0536 What's New, GC34-6994 Upgrading from CICS TS Version 2.3, GC34-6996 Upgrading from CICS TS Version 3.1, GC34-6997 Upgrading from CICS TS Version 3.
CICS Shared Data Tables Guide, SC34-7017 CICSPlex SM books for CICS Transaction Server for z/OS General CICSPlex SM Concepts and Planning, SC34-7044 CICSPlex SM Web User Interface Guide, SC34-7045 Administration and Management CICSPlex CICSPlex CICSPlex CICSPlex CICSPlex CICSPlex SM SM SM SM SM SM Administration, SC34-7005 Operations Views Reference, SC34-7006 Monitor Views Reference, SC34-7007 Managing Workloads, SC34-7008 Managing Resource Usage, SC34-7009 Managing Business Applications, SC34-7010 Pr
Accessibility Accessibility features help a user who has a physical disability, such as restricted mobility or limited vision, to use software products successfully. You can perform most tasks required to set up, run, and maintain your CICS system in one of these ways: v using a 3270 emulator logged on to CICS v using a 3270 emulator logged on to TSO v using a 3270 emulator as an MVS system console IBM Personal Communications provides 3270 emulation with accessibility features for people with disabilities.
248 CICS TS for z/OS 4.
Index A C abend handling 95, 151 ACID properties, of a transaction 20 activity keypoints description 22 ADCD abend 159 AFCF abend 159 AFCW abend 159 AIRDELAY 39 AIX (alternate index) 130, 147 alternate index (AIX) 130, 147 alternate indexes preserving locks over a rebuild 186 application processing unit designing 141 applications division into units of work 143 ASRA abend 94 atomic unit of work 20 autojournals 23 automatic journaling 23 automatic restart manager 67 automatic transaction initiation (ATI) 1
DL/I (continued) implicit enqueuing upon 158 intertransaction communication 146 scheduling program isolation scheduling 158 documenting recovery and restart programs 105 DSNBs, data set name blocks recovery 54 DTIMOUT option (DEFINE TRANSACTION) 123 dump table 50 dynamic RLS restart 37 dynamic transaction backout 6 basic mapping support 78 decision to use 150 E emergency restart backout 6 ENF, event notification facility notifying CICS of SMSVSAM restart 37, 89 enqueuing explicit enqueuing by application p
locking (continued) implicit locking on recoverable files 156 in application programs 154 locks 14 log of logs failures 119 logical levels, application program logical recovery 132 lost locks recovery from 89 92 M managing UOW state 18 MNPS 38, 39, 40 moving data sets using EXPORT and IMPORT commands 185 using REPRO command 183 moving data sets with locks FRBIND command 185 FRRESETRR command 185 FRSETRR subcommand 184 FRUNBIND command 185 multinode persistent sessions 38, 39, 40 MVS automatic restart mana
system log stream basic definition 104 system logs log-tail deletion 114 system or abend exit creation system recovery table (SRT) definition of 104 user extensions to 95 system warm keypoints 27 systems administration for BWO 208 95 T tables for recovery 104 task termination, abnormal 94 DFHPEP execution 94 DFHREST execution 93 task termination, normal 93 temporary storage backout 136 forward recovery 136 implicit enqueuing upon 157 recoverability 136 used for intertransaction communication 146 temporary
Readers’ Comments — We'd Like to Hear from You CICS Transaction Server for z/OS Version 4 Release 1 Recovery and Restart Guide Publication No. SC34-7012-01 We appreciate your comments about this publication. Please comment on specific errors or omissions, accuracy, organization, subject matter, or completeness of this book. The comments you send should pertain to only the information in this manual or product and the way in which the information is presented.
SC34-7012-01 ___________________________________________________________________________________________________ Readers’ Comments — We'd Like to Hear from You Cut or Fold Along Line _ _ _ _ _ _ _Fold _ _ _and _ _ _Tape _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _Please _ _ _ _ do _ _ not _ _ _staple _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _Fold _ _ _and _ _ Tape ______ PLACE POSTAGE STAMP HERE IBM United Kingdom Limited User Technologies Department (MP095) Hursley Park Wi
SC34-7012-01