Platform LSF Administration Guide Version 6.2

Tuning LSF for Large Clusters
Administering Platform LSF
544
Tuning LSF for Large Clusters
To enable and sustain large clusters, you need to tune LSF for efficient querying,
dispatching, and event log management.
Managing scheduling performance
For fast job dispatching in a large cluster, configure the following parameters:
LSB_MAX_JOB_DISPATCH_PER_SESSION in lsf.conf
The maximum number of jobs the scheduler can dispatch in one scheduling session
Some operating systems, such as Linux and AIX, let you increase the number of file
descriptors that can be allocated on the master host. You do not need to limit the
number of file descriptors to 1024 if you want fast job dispatching. To take
advantage of the greater number of file descriptors, you must set
LSB_MAX_JOB_DISPATCH_PER_SESSION to a value greater than 300.
MAX_SBD_CONNS in lsb.params
The maximum number of open file connections between mbatch and sbatchd.
Set
MAX_SBD_CONNS to the same value as
LSB_MAX_JOB_DISPATCH_PER_SESSION
To enable fast job
dispatch
1
Log in to the LSF master host as the root user.
2
Increase the system-wide file descriptor limit of your operating system if you have
not already done so.
3
In lsf.conf, set the parameter LSB_MAX_JOB_DISPATCH_PER_SESSION to a
value greater than 300.
For example:
LSB_MAX_JOB_DISPATCH_PER_SESSION = 1024
Ensure that the value of
LSB_MAX_JOB_DISPATCH_PER_SESSION
is less than
the maximum number of allowed open file descriptors.
4
In lsb.params, set the parameter MAX_SBD_CONNS to the same value as
LSB_MAX_JOB_DISPATCH_PER_SESSION
.
For example:
MAX_SBD_CONNS
=1024
5
In the shell you used to increase the file descriptor limit, shut down the LSF batch
daemons on the master host:
%
badmin hshutdown
%
badmin mbdrestart
6
Run badmin hstartup to restart the LSF batch daemons on the master host.
7
Run badmin hrestart all to restart every sbatchd in the cluster:
When you shut down the batch daemons on the master host, all LSF services are
temporarily unavailable, but existing jobs are not affected. When
mbatchd is later
started by sbatchd, its previous status is restored and job scheduling continues.
Scheduling tip
In large clusters, enable the scheduler to run constantly. Define the parameter
JOB_SCHEDULING_INTERVAL=0 in lsb.params: