Platform LSF Administration Guide Version 6.2

Changing Default LIM Behavior to Improve Performance
Administering Platform LSF
562
If you want the hosts that were rejected to be part of the cluster, ensure the number of
load indices in
lsf.cluster.cluster_name and lsf.shared are identical for all
master candidates and restart LIMs on the master and all master candidates:
%
lsadmin limrestart hostA hostB hostC
LSF_MASTER_LIST defined, and master host goes down
If LSF_MASTER_LIST is defined and the elected master host goes down, and if the
number of load indices in
lsf.cluster.cluster_name or lsf.shared for the new
elected master is different from the number of load indices in the files of the master that
went down, LSF will reject all master candidates that do not have the same number of
load indices in their files as the newly elected master. LSF will also reject all slave-only
hosts. This could cause a situation in which only the newly elected master is considered
part of the cluster.
A warning is logged in the log file
lim.log.new_master_host_name and the
cluster continues to run, but without the hosts that were rejected.
To resolve this, from the current master host, restart all LIMs:
%
lsadmin limrestart all
All slave-only hosts will be considered part of the cluster. Master candidates with a
different number of load indices in their
lsf.cluster.cluster_name or
lsf.shared files will be rejected.
When the master that was down comes back up, you will have the same situation as
described in “LSF_MASTER_LIST defined” on page 561. You will need to ensure load
indices defined in
lsf.cluster.cluster_name and lsf.shared for all master
candidates are identical and restart LIMs on all master candidates.