HP Serviceguard Extended Distance Cluster for Linux A.11.20.10 Deployment Guide, December 2012

Table 4 Disaster Scenarios and Their Handling (continued)
Recovery ProcessWhat Happens When This
Disaster Occurs
Disaster Scenario
In this scenario, no attempts are made to
repair the first failure until the second
failure occurs. Typically the second failure
occurs before the first failure is repaired.
1. To recover from the first failure, restore
the FC links between the data centers.
As a result, S1 is accessible from N2.
2. Run the following command to add S1
to md0 on N2:
# mdadm --add /dev/md0
/dev/hpdev/mylink-sde
This command initiates the re-mirroring
process. When it is complete, the
extended distance cluster detects S1 and
accepts it as md0.
For the second failure, restore N1. Once
it is restored, it joins the cluster and can
access S1 and S2.
1. Run the following command to enable
P1 to run on N1
# cmmodpkg -e P1 -n N1
The package (P1) continues to run
on Node 1 after the first failure,
with the MD0 that consists of only
S1.
After the second failure, the
package P1 fails over to N2 and
starts with S2. Data that was
written to S1 after the FC link
failure is now lost because
theRPO_TARGET was set to
IGNORE.
This is a multiple failure scenario where
the failures occur in a particular
sequence in the configuration that
corresponds to figure 2 where Ethernet
and FC links do not go over DWDM.
The RPO_TARGET for the package P1
is set to IGNORE.
The package is running on Node 1. P1
uses a mirror md0 consisting of S1
(local to node N1, -
/dev/hpdev/mylink-sde) and S2
(local to node N2). The first failure
occurs when all FC links between the
two data centers fail, causing Node 1
to lose access to S2 and Node 2 to lose
access to S1.
After sometime a second failure occurs.
Node 1 fails (because of power failure)
In this scenario, no attempts are made to
repair the first failure until the second
failure occurs. Typically, the second failure
occurs before the first failure is repaired.
1. To recover from the first failure, restore
the FC links between the data centers.
As a result, S1
(/dev/hpdev/mylink-sde) is
accessible from N2.
2. Run the following command to add S1
to md0 on N2:
# mdadm --add /dev/md0
/dev/hpdev/mylink-sde.
This command initiates the re-mirroring
process. When it is complete, the extended
distance cluster detects S1 and accepts it
as md0 again.
For the second failure, restore N1. Once
it is restored, it joins the cluster and can
access S1 and S2.
1. Run the following command to enable
P1 to run on N1
# cmmodpkg -e P1 -n N1
Package P1 continues to run on
N1 after the first failure with md0
consisting of only S1
After the second failure, package
P1 fails over to N2 and starts
with S2. This happens because
the disk S2 is non-current by less
than 60 seconds. This time limit
is set by the RPO_TARGET
parameter. Disk S2 has data that
is older than the other mirror half
S1. However, all data that was
written to S1 after the FC link
failure is lost
This failure is the same as the previous
failure except that the package (P1) is
configured with RPO_TARGET set to 60
seconds.
In this case, initially the package (P1)
is running on N 1. P1 uses a mirror
md0 consisting of S1 (local to node N1
- /dev/hpdev/mylink-sde) and S2
(local to node N2). The first failure
occurs when all FC links between the
two data centers fail, causing N1 to
lose access to S2 and N2 to lose
access to S1.
After the package resumes activity and
runs for 20 seconds, a second failure
occurs causing N 1 to fail, perhaps due
to power failure.
The package (P1) continues to run
on N1 with md0 consisting of
only S1 after the first failure
After the second failure, the
package does not start up on N2
because when it tries to start with
only S2 on N2, it detects that S2
is non-current for a time period
which is greater than the value of
RPO_TARGET.
37