LSF Version 7.3 - Release Notes for Platform LSF
To to limit the number of times a failed job is requeued, set MAX_JOB_REQUEUE cluster
wide (lsb.params), in the queue definition (lsb.queues), or in an application profile
(lsb.applications).
Specify an integer greater than zero (0).
MAX_JOB_REQUEUE in lsb.applications overrides lsb.queues, and lsb.queues
overrides lsb.params configuration.
When MAX_JOB_REQUEUE is set, if a job fails and its exit value falls into
REQUEUE_EXIT_VALUES, the number of times the job has been requeued is increased by
1 and the job is requeued. When the requeue limit is reached, the job is suspended with PSUSP
status. If a job fails and its exit value is not specified in REQUEUE_EXIT_VALUES, the job
is not requeued.
Automatic job requeue
The reserved keyword all specifies all exit codes. Exit codes are typically between 0 and 255.
Use a tilde (~) to exclude specified exit codes from the list.
For example:
REQUEUE_EXIT_VALUES=all ~1 ~2 EXCLUDE(9)
Jobs exited with all exit codes except 1 and 2 are requeued. Jobs with exit code 9 are requeued
so that the failed job is not rerun on the same host (exclusive job requeue).
Job-level automatic requeue
Use bsub -Q to submit a job that is automatically requeued if it exits with the specified exit
values. Use spaces to separate multiple exit codes. The reserved keyword all specifies all exit
codes. Exit codes are typically between 0 and 255. Use a tilde (~) to exclude specified exit codes
from the list.
Job-level requeue exit values override application-level and queue-level configuration of the
parameter REQUEUE_EXIT_VALUES, if defined.
Jobs running with the specified exit code share the same application and queue with other
jobs.
For example:
bsub -Q "all ~1 ~2 EXCLUDE(9)" myjob
Jobs exited with all exit codes except 1 and 2 are requeued. Jobs with exit code 9 are requeued
as exclusive jobs.
Precendence of checkpointing options
If checkpoint-related configuration is specified in both the queue and an application profile,
the application profile setting overrides queue level configuration.
If checkpoint-related configuration is specified in the queue, application profile, and at job
level:
•
Application-level and job-level parameters are merged. If the same parameter is defined
at both job-level and in the application profile, the job-level value overrides the application
profile value.
•
The merged result of job-level and application profile settings override queue-level
configuration.
Release Notes for Platform LSF
8 Release Notes for Platform LSF