Platform LSF Administration Guide Version 6.2
Understanding Job States
Administering Platform LSF
142
WAIT state (chunk jobs)
If you have configured chunk job queues, members of a chunk job that are waiting to
run are displayed as
WAIT by bjobs. Any jobs in WAIT status are included in the count
of pending jobs by
bqueues and busers, even though the entire chunk job has been
dispatched and occupies a job slot. The
bhosts command shows the single job slot
occupied by the entire chunk job in the number of jobs shown in the NJOBS column.
You can switch (
bswitch) or migrate (bmig) a chunk job member in WAIT state to
another queue.
Viewing wait
status and wait
reason
Use the bhist -l command to display jobs in WAIT status. Jobs are shown as
Waiting ...
The bjobs -l command does not display a WAIT reason in the list of pending jobs.
See Chapter 26, “Chunk Job Dispatch” for more information about chunk jobs.
Exited jobs
A job might terminate abnormally for various reasons. Job termination can happen from
any state. An abnormally terminated job goes into EXIT state. The situations where a
job terminates abnormally include:
◆
The job is cancelled by its owner or the LSF administrator while pending, or after
being dispatched to a host.
◆
The job is not able to be dispatched before it reaches its termination deadline, and
thus is aborted by LSF.
◆
The job fails to start successfully. For example, the wrong executable is specified by
the user when the job is submitted.
The job exits with a non-zero exit status.
You can configure hosts so that LSF detects an abnormally high rate of job exit from a
host. See “Handling Host-level Job Exceptions” on page 124 for more information.
Post-execution states
Some jobs may not be considered complete until some post-job processing is
performed. For example, a job may need to exit from a post-execution job script, clean
up job files, or transfer job output after the job completes.
The DONE or EXIT job states do not indicate whether post-processing is complete, so
jobs that depend on processing may start prematurely. Use the
post_done and
post_err keywords on the bsub -w command to specify job dependency conditions
for job post-processing. The corresponding job states POST_DONE and POST_ERR
indicate the state of the post-processing.
After the job completes, you cannot perform any job control on the post-processing.
Post-processing exit codes are not reported to LSF. The post-processing of a repetitive
job cannot be longer than the repetition period.
Viewing post-
execution states
Use the bhist command to display the POST_DONE and POST_ERR states. The
resource usage of post-processing is not included in the job resource usage.
Chapter 31, “Pre-Execution and Post-Execution Commands” for more information.