Platform LSF Reference Version 6.2
bjobs
Platform LSF Reference
66
EXIT
The job has terminated with a non-zero status – it may have been aborted
due to an error in its execution, or killed by its owner or the LSF
administrator.
For example, exit code 131 means that the job exceeded a configured
resource usage limit and LSF killed the job.
UNKWN
mbatchd
has lost contact with the sbatchd on the host on which the job
runs.
WAIT
For jobs submitted to a chunk job queue, members of a chunk job that are
waiting to run.
ZOMBI
A job will become ZOMBI if:
✧
A non-rerunnable job is killed by bkill while the sbatchd on the
execution host is unreachable and the job is shown as UNKWN.
✧
The host on which a rerunnable job is running is unavailable and the job
has been requeued by LSF with a new job ID, as if the job were submitted
as a new job.
After the execution host becomes available, LSF will try to kill the ZOMBI
job. Upon successful termination of the ZOMBI job, the job’s status will be
changed to EXIT.
With MultiCluster, when a job running on a remote execution cluster
becomes a ZOMBI job, the execution cluster will treat the job the same way
as local ZOMBI jobs. In addition, it notifies the submission cluster that the
job is in ZOMBI state and the submission cluster requeues the job.
RESOURCE USAGE
For the MultiCluster job forwarding model, this information is not shown if
MultiCluster resource usage updating is disabled.
The values for the current usage of a job include:
CPU time
Cumulative total CPU time in seconds of all processes in a job.
IDLE_FACTOR
Job idle information (CPU time/runtime) if JOB_IDLE is configured in
the queue, and the job has triggered an idle exception.
MEM
Total resident memory usage of all processes in a job, in MB.
SWAP
Total virtual memory usage of all processes in a job, in MB.
NTHREAD
Number of currently active threads of a job.
PGID
Currently active process group ID in a job.