Platform LSF Administration Guide Version 6.2
Tuning LSF for Large Clusters
Administering Platform LSF
546
Managing the number of pending reasons
For efficient, scalable management of pending reasons, use
CONDENSE_PENDING_REASONS in
lsb.params to condense all the host-
based pending reasons into one generic pending reason.
Syntax
CONDENSE_PENDING_REASONS=Y
If a job has no other main pending reason, bjobs -p or bjobs -l will display the
following:
Individual host based reasons
If you condense host-based pending reasons, but require a full pending reason list, you
can run the following command:
%
badmin diagnose
<
jobId>
You must be an LSF administrator or a queue administrator to run this
command.
Achieving efficient event switching
Periodic switching of the event file can weaken the performance of mbatchd, which
automatically backs up and rewrites the events file after every 1000 batch job
completions. The old
lsb.events file is moved to lsb.events.1, and each old
lsb.events.n file is moved to lsb.events.n+1.
Change the frequency of event switching with the following two parameters in
lsb.params:
◆
MAX_JOB_NUM specifies the number of batch jobs to complete before
lsb.events is backed up and moved to lsb.events.1. The default value is
1000
◆
MIN_SWITCH_PERIOD controls how frequently mbatchd checks the number of
completed batch jobs
The two parameters work together. Specify the
MIN_SWITCH_PERIOD value in
seconds.
For example:
MAX_JOB_NUM=1000
MIN_SWITCH_PERIOD=7200
This instructs mbatchd to check if the events file has logged 1000 batch job
completions every two hours. The two parameters can control the frequency of the
events file switching as follows:
◆
After two hours, mbatchd checks the number of completed batch jobs. If 1000
completed jobs have been logged, it switches the events file
◆
If 1000 jobs complete after five minutes, mbatchd does not switch the events file
until till the end of the two-hour period
Automatic load updating
Periodically, the LIM daemons exchange load information. In large clusters, let LSF
automatically load the information by dynamically adjusting the period based on the
load.