Specifications

LifeKeeper I-O Fencing Introduction
LifeKeeper I-O Fencing Introduction
I/O fencing is the locking away of data from a malfunctioning node preventing uncoordinated access
to shared storage. In an environment where multiple servers can access the same data, it is essential
that all writes are performed in a controlled manner to avoid data corruption. Problems can arise when
the failure detection mechanism breaks down because the symptoms of this breakdown can mimic a
failed node. For example, in a two-node cluster, if the connection between the two nodes fails, each
node would “think the other has failed, causing both to attempt to take control of the data resulting in
data corruption. I/O fencing removes this data corruption risk by blocking access to data from specific
nodes.
Disabling Reservations
While reservations provide the highest level of data protection for shared storage, in some cases, the
use of reservations is not available and must be disabled within LifeKeeper. With reservations
disabled, the storage no longer acts as an arbitrator in cases where multiple systems attempt to
access the storage, intentionally or unintentionally.
Consideration should be given to the use of other methods to fence the storage through cluster
membership which is needed to handle system hangs, system busy situations and any situation
where a server can appear to not be alive.
The key to a reliable configuration without reservations is to know that when a failover occurs, the
other” server has been powered off or power cycled. There are four fencing options that help
accomplish this, allowing LifeKeeper to provide a very reliable configuration, even without SCSI
reservations. These include the following:
l STONITH (Shoot the Other Node in the Head) using a highly reliable interconnect, i.e. serial
connection between server and STONITH device. STONITH is the technique to physically
disable or power-off a server when it is no longer considered part of the cluster. LifeKeeper
supports the ability to power off servers during a failover event thereby insuring safe access to
the shared data. This option provides reliability similar to reservations but is limited to two
nodes physically located together.
l Quorum/Witness Quorum/witness servers are used to confirm membership in the cluster,
especially when the cluster servers are at different locations. While this option can handle
split-brain, it, alone, is not recommended due to the fact that it does not handle system hangs.
l Watchdog Watchdog monitors the health of a server. If a problem is detected, the server
with the problem is rebooted or powered down. This option can recover from a server hang;
however, it does not handle split-brain; therefore this option alone is also not recommended.
l CONFIRM_SO This option requires that automatic failover be turned off, so while very
reliable (depending upon the knowledge of the administrator), it is not as available.
While none of these alternative fencing methods alone are likely to be adequate, when used in
combination, a very reliable configuration can be obtained.
SteelEye Protection Suite for Linux103