Platform LSF Administration Guide Version 6.2
About Job Requeue
Administering Platform LSF
386
About Job Requeue
A networked computing environment is vulnerable to any failure or temporary
conditions in network services or processor resources. For example, you might get NFS
stale handle errors, disk full errors, process table full errors, or network connectivity
problems. Your application can also be subject to external conditions such as a software
license problems, or an occasional failure due to a bug in your application.
Such errors are temporary and probably will happen at one time but not another, or on
one host but not another. You might be upset to learn all your jobs exited due to
temporary errors and you did not know about it until 12 hours later.
LSF provides a way to automatically recover from temporary errors. You can configure
certain exit values such that in case a job exits with one of the values, the job will be
automatically requeued as if it had not yet been dispatched. This job will then be retried
later. It is also possible for you to configure your queue such that a requeued job will not
be scheduled to hosts on which the job had previously failed to run.