LSF Version 7.3 - Administering Platform LSF

Administering Platform LSF 699
Error and Event Logging
TIP: The integer values logged to the JOB_FINISH event inlsb.acct and termination reason
keywords are mapped in lsbatch.h.
Restrictions ā—† If a queue-level JOB_CONTROL is configured, LSF cannot determine the
result of the action. The termination reason only reflects what the termination
reason could be in LSF.
ā—† LSF cannot be guaranteed to catch any external signals sent directly to the job.
ā—† In MultiCluster, a brequeue request sent from the submission cluster is
translated to TERM_OWNER or TERM_ADMIN in the remote execution
cluster. The termination reason in the email notification sent from the
execution cluster as well as that in the
lsb.acct is set to TERM_OWNER or
TERM_ADMIN.
Keyword displayed by bacct Termination reason Integer value logged
to JOB_FINISH in
lsb.acct
TERM_ADMIN Job killed by root or LSF administrator 15
TERM_BUCKET_KILL Job killed with bkill -b 23
TERM_CHKPNT Job killed after checkpointing 13
TERM_CPULIMIT Job killed after reaching LSF CPU usage limit 12
TERM_CWD_NOTEXIST Current working directory is not accessible or does not exist
on the execution host
25
TERM_DEADLINE Job killed after deadline expires 6
TERM_EXTERNAL_SIGNAL Job killed by a signal external to LSF 17
TERM_FORCE_ADMIN Job killed by root or LSF administrator without time for
cleanup
9
TERM_FORCE_OWNER Job killed by owner without time for cleanup 8
TERM_LOAD Job killed after load exceeds threshold 3
TERM_MEMLIMIT Job killed after reaching LSF memory usage limit 16
TERM_OTHER Member of a chunk job in WAIT state killed and requeued
after being switched to another queue.
4
TERM_OWNER Job killed by owner 14
TERM_PREEMPT Job killed after preemption 1
TERM_PROCESSLIMIT Job killed after reaching LSF process limit 7
TERM_REQUEUE_ADMIN Job killed and requeued by root or LSF administrator 11
TERM_REQUEUE_OWNER Job killed and requeued by owner 10
TERM_RMS Job exited from an RMS system error 18
TERM_RUNLIMIT Job killed after reaching LSF run time limit 5
TERM_SLURM Job terminated abnormally in SLURM (node failure) 22
TERM_SWAP Job killed after reaching LSF swap usage limit 20
TERM_THREADLIMIT Job killed after reaching LSF thread limit 21
TERM_UNKNOWN LSF cannot determine a termination reason—0 is logged but
TERM_UNKNOWN is not displayed
0
TERM_WINDOW Job killed after queue run window closed 2
TERM_ZOMBIE Job exited while LSF is not available 19