LSF Version 7.3 - Administering Platform LSF

Tuning LSF for Large Clusters
618 Administering Platform LSF
When you define this parameter, mbatchd periodically obtains the host status from
the master LIM, and then verifies the status by polling each
sbatchd at an interval
defined by the parameters MBD_SLEEP_TIME and LSB_MAX_PROBE_SBD.
Managing your users ability to move jobs in a queue
JOB_POSITION_CONTROL_BY_ADMIN=Y allows an LSF administrator to control
whether users can use
btop and bbot to move jobs to the top and bottom of queues.
When set, only the LSF administrator (including any queue administrators) can use
bbot and btop to move jobs within a queue. A user attempting to user bbot or btop
receives the error “User permission denied.
REMEMBER: You must be an LSF administrator to set this parameter.
Managing the number of pending reasons
For efficient, scalable management of pending reasons, use
CONDENSE_PENDING_REASONS=Y in lsb.params to condense all the host-based
pending reasons into one generic pending reason.
If a job has no other main pending reason,
bjobs -p or bjobs -l will display the
following:
Individual host based reasons
If you condense host-based pending reasons, but require a full pending reason list,
you can run the following command:
badmin diagnose <job_ID>
REMEMBER: You must be an LSF administrator or a queue administrator to run this command.
Achieving efficient event switching
Periodic switching of the event file can weaken the performance of mbatchd ,which
automatically backs up and rewrites the events file after every 1000 batch job
completions. The old
lsb.events file is moved to lsb.events.1, and each old
lsb.events.n file is moved to lsb.events.n+1.
Change the frequency of event switching with the following two parameters in
lsb.params:
MAX_JOB_NUM specifies the number of batch jobs to complete before
lsb.events is backed up and moved to lsb.events.1. The default value is
1000
MIN_SWITCH_PERIOD controls how frequently mbatchd checks the number of
completed batch jobs
The two parameters work together. Specify the
MIN_SWITCH_PERIOD value in
seconds.
For example:
MAX_JOB_NUM=1000
MIN_SWITCH_PERIOD=7200