LSF Version 7.3 - Administering Platform LSF

LSF job termination reason logging
728 Administering Platform LSF
loadSched - -
loadStop - -
LSF job termination reason logging
When LSF takes action on a job, it may send multiple signals. In the case of job
termination, LSF will send, SIGINT, SIGTERM and SIGKILL in succession until
the job has terminated. As a result, the job may exit with any of those corresponding
exit values at the system level. Other actions may send "warning" signals to
applications (SIGUSR2) etc. For specific signal sequences, refer to the LSF
documentation for that feature.
Run
bhist to see the actions that LSF takes on a job:
bhist -l 1798
Job <1798>, User <user1>, Command <sleep 10000>
Tue Feb 25 16:35:31: Submitted from host <hostA>, to Queue <normal>, CWD <$H
OME/lsf_7.0/conf/lsbatch/lsf_7.0/configdir>;
Tue Feb 25 16:35:51: Dispatched to <hostA>;
Tue Feb 25 16:35:51: Starting (Pid 12955);
Tue Feb 25 16:35:53: Running with execution home </home/user1>, Execution CWD <
/home/user1/Testing/lsf_7.0/conf/lsbatch/lsf_7.0/configdir>,
Execution Pid <12955>;
Tue Feb 25 16:38:20: Signal <KILL> requested by user or administrator <user1>;
Tue Feb 25 16:38:22: Exited with exit code 130. The CPU time used is 0.1 seconds;
Summary of time in seconds spent in various states by Tue Feb 25 16:38:22
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
20 0 151 0 0 0 171
Here we see that LSF itself sent the signal to terminate the job, and the job exits 130
(130-128 = 2 = SIGINT).
When a job finishes, LSF reports the last job termination action it took against the
job and logs it into
lsb.acct.
If a running job exits because of node failure, LSF sets the correct exit information
in
lsb.acct, lsb.events, and the job output file.
View logged job exit information (bacct -l)
1 Use bacct -l to view job exit information logged to lsb.acct:
bacct -l 7265
Accounting information about jobs that are:
- submitted by all users.
- accounted on all projects.
- completed normally or exited
- executed on all hosts.
- submitted to all queues.
- accounted on all service classes.
------------------------------------------------------------------------------
Job <7265>, User <lsfadmin>, Project <default>, Status <EXIT>, Queue <normal>,