Troubleshooting guide

Chapter 9 Troubleshooting Active Network Management Fail-over in High Availability Applications
Advanced Technical Reference Guide 4.1 June 2000 103
Problems detected by the VPN/FireWall module should also be reported using the Active Check Device
Interface- for example, if the fwd daemon is running on each module.
How to check the modules status using the chaprob command
The cphaprob command may be used to register or un-register devices, to report problems, print the list of
devices currently registered and the state of each device. Devices are referred to by name (this name also
appears in the logs, so they should be meaningful and not too long (up to 16 characters).
The syntax of this command can be found in the Check Point 200 Administration Guide on page 575.
This interface allows reporting three states via the Interface Active Check Device (see Interface Active Check
Device on page 102) (the Active Check Device): ok (= ACTIVE)init(=INIT) and problem (= DEAD). This
interface does not allow blocking at READY or STANDBY (blocking at these states seems meaningless though
the LB (Load Balancing) configuration device does block at READY).
Each machine constantly reports (in the FWHAP_MY_STATE message) the number of interface which it has
determined to be up (it distinguishes between "inbound" and "outbound" communication). If one machine has
fewer "UP" interfaces than another machine in the cluster, a problem is reported by this machine's interface
active check mechanism. This means that if an interface is disconnected on all machines, no problem is
detected. It should take about 2 seconds to discover an interface problem (it is preferable to lose a few packets
than to fail over unnecessarily).
The interface problem detection mechanism should be able to detect "Uni-directional" problems, for example a
problem on an interface that can send but not receive packets.
VPN Fail-Over
By leveraging VPN-1 state table synchronization, which includes key exchange information, Check Point’s
High Availability maintains IKE based VPN connections in the event of a fail-over.
VPN solutions without IKE fail-over drop all connections in the event of a failure thus forcing users to re-
authenticate and re-establish connections. IKE fail-over delivers a seamless transition that is critical for many
VPN deployments.
Troubleshooting Fail-Over
The High Availability cluster contains one primary module and one or more secondary modules. When the
primary module fails, one of the secondary module becomes Active.
The following tests can be used to check if the failover capability is working properly, and to isolate problems if
it is not. Both HA modes are tested: Primary-up mode, and Active-up mode.
In primary-up mode the machine with the smallest ID should, if it can, be ACTIVE. This means that if the
primary machine goes down (and fails-over to the secondary machine) and then comes back up, the primary
machine will again filter connections (even though the secondary machine is still functioning properly).
In active-up mode the machine that is currently active remains active (even when another machine in the
cluster with a smaller number is OK) until this (active) machine goes down, at which point the stand-by
machine with the smallest number should take over.
Note: See also Debugging High Availability”, page 106.