LSF Version 7.3 - Administering Platform LSF
Administering Platform LSF 617
Achieving Performance and Scalability
Enable continuous
scheduling
1 To enable the scheduler to run continuously, define the parameter
JOB_SCHEDULING_INTERVAL=0 in lsb.params.
Limiting the number of batch queries
In large clusters, job querying can grow very quickly. If your site sees a lot of high
traffic job querying, you can tune LSF to limit the number of job queries that
mbatchd can handle. This helps decrease the load on the master host.
If a job information query is sent after the limit has been reached, an error message
is displayed and
mbatchd keeps retrying, in one second intervals. If the number of
job queries later drops below the limit,
mbatchd handles the query.
You define the maximum number of concurrent jobs queries to be handled by
mbatchd in the parameter MAX_CONCURRENT_JOB_QUERY in lsb.params:
◆ If mbatchd is using multithreading, a dedicated query port is defined by the
parameter
LSB_QUERY_PORT in lsf.conf. When mbatchd has a dedicated
query port, the value of
MAX_CONCURRENT_JOB_QUERY sets the maximum
number of queries that can be handled by each child
mbatchd that is forked by
mbatchd. This means that the total number of job queries handled can be more
than the number specified by
MAX_CONCURRENT_JOB_QUERY
(MAX_CONCURRENT_JOB_QUERY multiplied by the number of child daemons
forked by
mbatchd).
◆ If mbatchd is not using multithreading, the value of
MAX_CONCURRENT_JOB_QUERY sets the maximum total number of job queries
that can be handled by
mbatchd
Syntax MAX_CONCURRENT_JOB_QUERY=max_query
Where:
max_query
Specifies the maximum number of job queries that can be handled by mbatchd.
Valid values are positive integers between 1 and 100. The default value is unlimited.
Examples MAX_CONCURRENT_JOB_QUERY=20
Specifies that no more than 20 queries can be handled by mbatchd.
MAX_CONCURRENT_JOB_QUERY=101
Incorrect value. The default value will be used. An unlimited number of job queries
will be handled by
mbatchd.
Improving the speed of host status updates
To improve the speed with which mbatchd obtains and reports host status,
configure the parameter LSB_SYNC_HOST_STAT_LIM in the file
lsb.params.
This also improves the speed with which LSF reschedules jobs: the sooner LSF
knows that a host has become unavailable, the sooner LSF reschedules any
rerunnable jobs executing on that host.
For example, during maintenance operations, the cluster administrator might need
to shut down half of the hosts at once. LSF can quickly update the host status and
reschedule any rerunnable jobs that were running on the unavailable hosts.