HP Serviceguard Extended Distance Cluster for Linux A.11.20.10 Deployment Guide, December 2012

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux License Kit

Table 4 Disaster Scenarios and Their Handling (continued)

Recovery ProcessWhat Happens When This

Disaster Occurs

Disaster Scenario

In this scenario, no attempts are made to

repair the first failure until the second

failure occurs. Typically the second failure

occurs before the first failure is repaired.

1. To recover from the first failure, restore

the FC links between the data centers.

As a result, S1 is accessible from N2.

2. Run the following command to add S1

to md0 on N2:

# mdadm --add /dev/md0

/dev/hpdev/mylink-sde

This command initiates the re-mirroring

process. When it is complete, the

extended distance cluster detects S1 and

accepts it as md0.

For the second failure, restore N1. Once

it is restored, it joins the cluster and can

access S1 and S2.

1. Run the following command to enable

P1 to run on N1

# cmmodpkg -e P1 -n N1

The package (P1) continues to run

on Node 1 after the first failure,

with the MD0 that consists of only

S1.

After the second failure, the

package P1 fails over to N2 and

starts with S2. Data that was

written to S1 after the FC link

failure is now lost because

theRPO_TARGET was set to

IGNORE.

This is a multiple failure scenario where

the failures occur in a particular

sequence in the configuration that

corresponds to figure 2 where Ethernet

and FC links do not go over DWDM.

The RPO_TARGET for the package P1

is set to IGNORE.

The package is running on Node 1. P1

uses a mirror md0 consisting of S1

(local to node N1, -

/dev/hpdev/mylink-sde) and S2

(local to node N2). The first failure

occurs when all FC links between the

two data centers fail, causing Node 1

to lose access to S2 and Node 2 to lose

access to S1.

After sometime a second failure occurs.

Node 1 fails (because of power failure)

In this scenario, no attempts are made to

repair the first failure until the second

failure occurs. Typically, the second failure

occurs before the first failure is repaired.

1. To recover from the first failure, restore

the FC links between the data centers.

As a result, S1

(/dev/hpdev/mylink-sde) is

accessible from N2.

2. Run the following command to add S1

to md0 on N2:

# mdadm --add /dev/md0

/dev/hpdev/mylink-sde.

This command initiates the re-mirroring

process. When it is complete, the extended

distance cluster detects S1 and accepts it

as md0 again.

For the second failure, restore N1. Once

it is restored, it joins the cluster and can

access S1 and S2.

1. Run the following command to enable

P1 to run on N1

# cmmodpkg -e P1 -n N1

Package P1 continues to run on

N1 after the first failure with md0

consisting of only S1

After the second failure, package

P1 fails over to N2 and starts

with S2. This happens because

the disk S2 is non-current by less

than 60 seconds. This time limit

is set by the RPO_TARGET

parameter. Disk S2 has data that

is older than the other mirror half

S1. However, all data that was

written to S1 after the FC link

failure is lost

This failure is the same as the previous

failure except that the package (P1) is

configured with RPO_TARGET set to 60

seconds.

In this case, initially the package (P1)

is running on N 1. P1 uses a mirror

md0 consisting of S1 (local to node N1

- /dev/hpdev/mylink-sde) and S2

(local to node N2). The first failure

occurs when all FC links between the

two data centers fail, causing N1 to

lose access to S2 and N2 to lose

access to S1.

After the package resumes activity and

runs for 20 seconds, a second failure

occurs causing N 1 to fail, perhaps due

to power failure.

The package (P1) continues to run

on N1 with md0 consisting of

only S1 after the first failure

After the second failure, the

package does not start up on N2

because when it tries to start with

only S2 on N2, it detects that S2

is non-current for a time period

which is greater than the value of

RPO_TARGET.