HP Serviceguard Extended Distance Cluster for Linux A.12.00.00 Deployment Guide, March 2014

ManualsBrandsHP ManualsSoftwareHP SAP Linux Serviceguard Cluster Extension

Table 4 Disaster Scenarios and Their Handling (continued)

Recovery ProcessWhat Happens When This Disaster

Occurs

Disaster Scenario

Complete the following steps to initiate a recovery:

1. Restore the FC links between the data centers.

As a result, S2 (/dev/hpdev/mylink-sdf )

becomes available to N1 and S1

(/dev/hpdev/mylink-sde ) becomes

accessible from N2.

2. To start the package P1 on N1, check the

package log file in the package directory and

run the commands which will appear to force a

package start.

When the package starts up on N1, it automatically

adds S2 back into the array and the re-mirroring

process is started. When re-mirroring is complete,

the extended distance cluster detects and accepts

S1 as part of md0.

When the first failure occurs, the

package (P1) continues to run on

N1 with md0 consisting of only S1.

When the second failure occurs, the

package fails over to N2 and starts

with S2.

When N2 fails, the package does

not start on node N1 because a

package is allowed to start only

once with a single disk. You must

repair this failure and both disks

must be synchronized and be a part

of the MD array before another

failure of same pattern occurs.

In this failure scenario, only S1 is

available to P1 on N1, as the FC

links between the data centers are

not repaired. As P1 started once

with S2 on N2, it cannot start on

N1 until both disks are available.

In this case, the package

(P1) runs with RPO-TARGET

set to 60 seconds.

In this case, initially the

package (P1) is running on

node N1. P1 uses a mirror

md0 consisting of S1 (local

to node N1, for example

/dev/hpdev/mylink-sde)

and S2 (local to node N2).

The first failure occurs when

all FC links between the two

data centers fail, causing

N1 to lose access to S2 and

N2 to lose access to S1.

Immediately afterwards, a

second failure occurs where

node (N1) goes down

because of a power failure.

After N1 is repaired and

brought back into the cluster,

package switching of P1 to

N1 is enabled.

IMPORTANT: While it is

not a good idea to enable

package switching of P1 to

N1, it is described here to

show recovery from an

operator error.

The FC links between the

data centers are not

repaired and N2 becomes

inaccessible because of a

power failure.

Complete the following steps to initiate a recovery:

1. You need to only restore the Ethernet links

between the data centers so that N1 and N2

can exchange heartbeats

2. After restoring the links, you must add the node

that was rebooted as part of the cluster. Run the

cmrunnode command to add the node to the

cluster.

NOTE: If this failure is a precursor to a site failure,

and if the Quorum Service arbitration selects the

site that is likely to have a failure, it is possible that

the entire cluster will go down.

With this failure, the heartbeat

exchange is lost between N1 and

N2. This results in both nodes trying

to get to the Quorum server.

If N1 accesses the Quorum server

first, the package continues to run

on N1 with S1 and S2 while N2 is

rebooted. If N2 accesses the

Quorum server, the package fails

over to N2 and starts running with

both S1 and S2 and N1 is

rebooted.

In this case, initially the

package (P1) is running on

node N1. P1 uses a mirror

md0 consisting of S1 (local

to node N1, for example

/dev/hpdev/mylink-sde)

and S2 (local to node N2).

The first failure occurs with

all Ethernet links between the

two data centers failing.

Complete the following procedure to initiate a

recovery:

1. Restore the Ethernet links from N1 to the switch

in data center 1.

2. After restoring the links, you must add the node

that was rebooted as part of the cluster. Run the

cmrunnode command to add the node to the

cluster.

With this failure, the heartbeat

exchange between N1 and N2 is

lost.

N2 accesses the Quorum server, as

it is the only node which has access

to the Quorum server. The package

fails over to N2 and starts running

with both S1 and S2 while N1 gets

rebooted.

In this case, initially the

package (P1) is running on

node N1. P1 uses a mirror

md0 consisting of S1 (local

to node N1, say

/dev/hpdev/mylink-sde)

and S2 (local to node N2).

The first failure occurs when

the Ethernet links from N1 to

the Ethernet switch in

datacenter1 fails.