LSF Version 7.3 - Administering Platform LSF

Tuning LIM
666 Administering Platform LSF
lsadmin limrestart hostA hostB hostC
LSF_MASTER_LIST defined, and master host goes down
If LSF_MASTER_LIST is defined and the elected master host goes down, and if the
number of load indices in
lsf.cluster.cluster_name or lsf.shared for the new
elected master is different from the number of load indices in the files of the master
that went down, LSF will reject all master candidates that do not have the same
number of load indices in their files as the newly elected master. LSF will also reject
all slave-only hosts. This could cause a situation in which only the newly elected
master is considered part of the cluster.
A warning is logged in the log file
lim.log.new_master_host_name and the cluster
continues to run, but without the hosts that were rejected.
To resolve this, from the current master host, restart all LIMs:
lsadmin limrestart all
All slave-only hosts will be considered part of the cluster. Master candidates with a
different number of load indices in their
lsf.cluster.cluster_name or
lsf.shared files will be rejected.
When the master that was down comes back up, you will have the same situation as
described in LSF_MASTER_LIST defined on page 664. You will need to ensure
load indices defined in
lsf.cluster.cluster_name and lsf.shared for all master
candidates are identical and restart LIMs on all master candidates.