LSF Version 7.3 - Administering Platform LSF

Administering Platform LSF 469
Job Requeue and Job Rerun
Automatic Job Requeue
You can configure a queue to automatically requeue a job if it exits with a specified
exit value.
The job is requeued to the head of the queue from which it was dispatched,
unless the LSB_REQUEUE_TO_BOTTOM parameter in
lsf.conf is set.
When a job is requeued, LSF does not save the output from the failed run.
When a job is requeued, LSF does not notify the user by sending mail.
A job terminated by a signal is not requeued.
The reserved keyword
all specifies all exit codes. Exit codes are typically between
0 and 255. Use a tilde (
~) to exclude specified exit codes from the list.
For example:
REQUEUE_EXIT_VALUES=all ~1 ~2 EXCLUDE(9)
Jobs exited with all exit codes except 1 and 2 are requeued. Jobs with exit code 9 are
requeued as exclusive jobs.
Configure automatic job requeue
1 To configure automatic job requeue, set REQUEUE_EXIT_VALUES in the
queue definition (
lsb.queues) or in an application profile
(
lsb.applications) and specify the exit codes that cause the job to be
requeued.
Application-level exit values override queue-level values. Job-level exit values
(
bsub -Q) override application-level and queue-level values.
Begin Queue
...
REQUEUE_EXIT_VALUES = 99 100
...
End Queue
This configuration enables jobs that exit with 99 or 100 to be requeued.
Control how many times a job can be requeued
By default, if a job fails and its exit value falls into REQUEUE_EXIT_VALUES, LSF
requeues the job automatically. Jobs that fail repeatedly are requeued five times by
default.
1 To limit the number of times a failed job is requeued, set
MAX_JOB_REQUEUE cluster wide (
lsb.params), in the queue definition
(
lsb.queues), or in an application profile (lsb.applications).
Specify an integer greater than zero (0).
MAX_JOB_REQUEUE in
lsb.applications overrides lsb.queues, and
lsb.queues overrides lsb.params configuration. Specifying a job-level exit
value using bsub -Q overrides all MAX_JOB_REQUEUE settings.