Platform LSF Administration Guide Version 6.2

Chapter 28
Running Parallel Jobs
Administering Platform LSF
453
Killing other running jobs prematurely does not affect the calculated run limit of an
interruptible backfill job. Slot-reserving jobs will not start sooner.
While the queue is checked for the consistency of interruptible backfill, backfill and
runtime specifications, the requeue exit value clause is not verified, nor executed
automatically. Configure requeue exit values according to your site policies.
In LSF MultiCluster, bhist does not display interruptible backfill information for
remote clusters.
A migrated job belonging to an interruptible backfill queue is migrated as if
LSB_MIG2PEND is set.
Configuring an
interruptible
backfill queue
Configure INTERRRUPTIBLE_BACKFILL=seconds in the lowest priority queue in
the cluster. There can only be one interruptible backfill queue in the cluster.
Specify the minimum number of seconds for the job to be considered for
backfilling.This minimal time slice depends on the specific job properties; it must be
longer than at least one useful iteration of the job. Multiple queues may be created if a
site has jobs of distinctively different classes.
For example:
Begin Queue
QUEUE_NAME = background
# REQUEUE_EXIT_VALUES (set to whatever needed)
DESCRIPTION = Interruptible Backfill queue
BACKFILL = Y
INTERRUPTIBLE_BACKFILL = 1
RUNLIMIT = 10
PRIORITY = 1
End Queue
Interruptible backfill is disabled if BACKFILL and RUNLIMIT are not configured in
the queue.
The value of INTERRUPTIBLE_BACKFILL is the minimal time slice in seconds for
a job to be considered for backfill. The value depends on the specific job properties; it
must be longer than at least one useful iteration of the job. Multiple queues may be
created for different classes of jobs.
BACKFILL and RUNLIMIT must be configured in the queue.
RUNLIMIT corresponds to a maximum time slice for backfill, and should be
configured so that the wait period for the new jobs submitted to the queue is acceptable
to users. 10 minutes of runtime is a common value.
You should configure REQUEUE_EXIT_VALUES for the queue so that resubmission
is automatic. In order to terminate completely, jobs must have specific exit values:
If jobs are checkpointible, use their checkpoint exit value.
If jobs periodically save data on their own, use the SIGTERM exit value.