Platform LSF Reference Version 6.2

lsb.queues
Platform LSF Reference
409
Specify the minimum number of seconds for the job to be considered for
backfilling.This minimal time slice depends on the specific job properties; it must be
longer than at least one useful iteration of the job. Multiple queues may be created if a
site has jobs of distinctively different classes.
An interruptible backfill job:
Starts as a regular job and is killed when it exceeds the queue runtime limit
OR
Is started for backfill whenever there is a backfill time slice longer than the specified
minimal time, and killed before the slot-reservation job is about to start
The queue RUNLIMIT corresponds to a maximum time slice for backfill, and should
be configured so that the wait period for the new jobs submitted to the queue is
acceptable to users. 10 minutes of runtime is a common value.
You should configure REQUEUE_EXIT_VALUES for interruptible backfill queues.
BACKFILL and RUNLIMIT must be configured in the queue. The queue is disabled if
BACKFILL and RUNLIMIT are not configured.
Assumptions and
limitations:
The interruptible backfill job will hold the slot-reserving job start until its calculated
start time, in the same way as a regular backfill job. The interruptible backfill job will
not be preempted in any way other than being killed when its time come.
While the queue is checked for the consistency of interruptible backfill, backfill and
runtime specifications, the requeue exit value clause is not verified, nor executed
automatically. Configure requeue exit values according to your site policies.
The interruptible backfill job must be able to do at least one unit of useful
calculations and save its data within the minimal time slice, and be able to continue
its calculations after it has been restarted
Interruptible backfill paradigm does not explicitly prohibit running parallel jobs,
distributed across multiple nodes, however, the chance of success of such job is
close to zero.
Default
Undefined (no interruptible backfilling)
JOB_ACCEPT_INTERVAL
Syntax
JOB_ACCEPT_INTERVAL=
integer
Description
The number you specify is multiplied by the value of lsb.params
MBD_SLEEP_TIME (60 seconds by default). The result of the calculation is the
number of seconds to wait after dispatching a job to a host, before dispatching a second
job to the same host.
If 0 (zero), a host may accept more than one job in each dispatch turn. By default, there
is no limit to the total number of jobs that can run on a host, so if this parameter is set
to 0, a very large number of jobs might be dispatched to a host all at once. This can
overload your system to the point that it will be unable to create any more processes. It
is not recommended to set this parameter to 0.
JOB_ACCEPT_INTERVAL set at the queue level (
lsb.queues) overrides
JOB_ACCEPT_INTERVAL set at the cluster level (
lsb.params).