LSF Version 7.3 - Using Platform LSF HPC
Controlling and Monitoring Jobs Being Debugged in
TotalView
Controlling jobs
While your job is running and you are using TotalView to debug it, you cannot use LSF
job control commands:
◆
bchkpnt and bmig are not supported
◆
Default TotalView signal processing prevents bstop and bresume from
suspending and resuming jobs, and
bkill from terminating jobs
◆
brequeue causes TotalView to display all jobs in error status. Click Go and the jobs
will rerun.
◆
Job rerun within TotalView is not supported. Do not submit jobs for debugging to
a rerunnable queue.
Monitoring jobs
Use bjobs to see the resource usage of jobs running under TotalView:
bsub -n 2 -a tvmpich_gm mpirun.lsf ./cpi -tvopt -no_ask_on_dlopen
Job <365> is submitted to queue <hpc_linux>.
bjobs -l 365
Job <365>, User <user1>, Project <default>, Status <DONE>, Queue
<hpc_linux>,
Command <totalview pam -no_ask_on_dlopen -a -g 1
-tv gmmpirun_wrapper ./cpi>
Fri Oct 11 15:46:47: Submitted from host <hostA>, CWD <$HOME>, 2
Processors
Requested, Requested Resources <select[ (gm_ports >
0) ] rusage[gm_ports=1:duration=10]>;
Fri Oct 11 15:46:58: Started on 2 Hosts/Processors <hostA> <hostB>,
Execution Home </home/user1>, Execution CWD
</home/user1>;
Fri Oct 11 15:53:07: Done successfully. The CPU time used is 69.7 seconds.
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp
mem
loadSched - - - - - - - - - -
-
loadStop - - - - - - - - - -
-
adapter_windows
loadSched - - -
loadStop - - -
% bsub -a tvpoe -n 4 mpirun.lsf $JOB
Job <341> is submitted to queue <hpc_ibm>.