LSF Version 7.3 - Platform LSF Configuration Reference
Commands to monitor
Command Description
bacct -l
•
Displays accounting statistics for finished jobs, including termination
reasons. TERM_CHKPNT indicates that a job was checkpointed and
killed.
•
If JOB_CONTROL is defined for a queue, LSF does not display the
result of the action.
bhist -l
•
Displays the actions that LSF took on a completed job, including job
checkpoint, restart, and migration to another host.
bjobs -l
•
Displays information about pending, running, and suspended jobs,
including the checkpoint directory, the checkpoint period, and the
checkpoint method (either application or default).
Commands to control
Command
Description
bmod -k "checkpoint_dir
[checkpoint_period]
[method=echkpnt_application]"
•
Resubmits a job and changes the checkpoint directory, checkpoint
period, and the checkpoint and restart executables associated with
the job.
•
<checking on what bmod can actually change or whether it can just
specify these if they have not already been specified with bsub. Will
find out when Yousri is back from vacation.>
bmod -kn
•
Dissociates the checkpoint directory from a job, which makes the job
no longer checkpointable.
bchkpnt
•
Checkpoints the most recently submitted checkpointable job. Users
can specify particular jobs to checkpoint by including various
bchkpnt options.
bchkpnt -p checkpoint_period job_ID
•
Checkpoints a job immediately and changes the checkpoint period for
the job.
bchkpnt -k job_ID
•
Checkpoints a job immediately and kills the job.
bchkpnt -p 0 job_ID
•
Checkpoints a job immediately and disables periodic checkpointing.
brestart
•
Restarts a checkpointed job on the first available host.
brestart -m
•
Restarts a checkpointed job on the specified host or host group.
bmig
•
Migrates one or more running jobs from one host to another. The jobs
must be checkpointable or rerunnable.
•
Checkpoints, kills, and restarts one or more checkpointable jobs.
Feature: Job checkpoint and restart
Platform LSF Configuration Reference 101