LSF Version 7.3 - Platform LSF Configuration Reference
20 0 151 0 0 0 171
Here we see that LSF itself sent the signal to terminate the job, and the job exits 130 (130-128
= 2 = SIGINT).
When a job finishes, LSF reports the last job termination action it took against the job and
logs it into lsb.acct.
If a running job exits because of node failure, LSF sets the correct exit information in
lsb.acct, lsb.events, and the job output file.
View logged job exit information (bacct -l)
1.
Use bacct -l to view job exit information logged to lsb.acct:
bacct -l 7265
Accounting information about jobs that are:
- submitted by all users.
- accounted on all projects.
- completed normally or exited
- executed on all hosts.
- submitted to all queues.
- accounted on all service classes.
------------------------------------------------------------------------------
Job <7265>, User <lsfadmin>, Project <default>, Status <EXIT>, Queue <normal>,
Command <srun sleep 100000>
Thu Sep 16 15:22:09: Submitted from host <hostA>, CWD <$HOME>;
Thu Sep 16 15:22:20: Dispatched to 4 Hosts/Processors <4*hostA>;
Thu Sep 16 15:23:21: Completed <exit>; TERM_RUNLIMIT: job killed after reaching
LSF run time limit.
Accounting information about this job:
Share group charged </lsfadmin>
CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP
0.04 11 72 exit 0.0006 0K 0K
------------------------------------------------------------------------------
SUMMARY: ( time unit: second )
Total number of done jobs: 0 Total number of exited jobs: 1
Total CPU time consumed: 0.0 Average CPU time consumed: 0.0
Maximum CPU time of a job: 0.0 Minimum CPU time of a job: 0.0
Total wait time in queues: 11.0
Average wait time in queue: 11.0
Maximum wait time in queue: 11.0 Minimum wait time in queue: 11.0
Average turnaround time: 72 (seconds/job)
Maximum turnaround time: 72 Minimum turnaround time: 72
Average hog factor of a job: 0.00 ( cpu time / turnaround time )
Maximum hog factor of a job: 0.00 Minimum hog factor of a job: 0.00
Termination reasons displayed by bacct
When LSF detects that a job is terminated, bacct -l displays one of the following termination
reasons:
Keyword displayed by
bacct
Termination reason Integer value logged
to JOB_FINISH in
lsb.acct
TERM_ADMIN Job killed by root or LSF administrator 15
TERM_BUCKET_KILL Job killed with bkill -b 23
TERM_CHKPNT Job killed after checkpointing 13
TERM_CPULIMIT Job killed after reaching LSF CPU usage limit 12
TERM_CWD_NOTEXIST Current working directory is not accessible or does not exist on
the execution host
25
Understanding Platform LSF job exit information
612 Platform LSF Configuration Reference