Platform LSF Reference Version 6.2
bchkpnt
Platform LSF Reference
37
bchkpnt
checkpoints one or more checkpointable jobs
SYNOPSIS
bchkpnt [-f] [-k] [-p minutes | -p 0] [job_ID | "job_ID[index_list]"]...
bchkpnt [-f] [-k] [-p minutes | -p 0] [-J job_name]
[
-m host_name | -m host_group] [-q queue_name] [-u "user_name" | -u all] [0]
bchkpnt [-h | -V]
DESCRIPTION
Checkpoints your running (RUN) or suspended (SSUSP, USUSP, and PSUSP)
checkpointable jobs. LSF administrators and root can checkpoint jobs submitted by
other users.
By default, checkpoints one job, the most recently submitted job, or the most recently
submitted job that also satisfies other specified options (
-m, -q, -u and -J). Specify -
0
(zero) to checkpoint multiple jobs. Specify a job ID to checkpoint one specific job.
By default, jobs continue to execute after they have been checkpointed.
To submit a checkpointable job, use
bsub -k or submit the job to a checkpoint queue
(CHKPNT in
lsb.queues(5)). Use brestart(1) to start checkpointed jobs.
LSF invokes the
echkpnt(8) executable found in LSF_SERVERDIR to perform the
checkpoint.
Only running members of a chunk job can be checkpointed. For chunk jobs in WAIT
state,
mbatchd rejects the checkpoint request.
OPTIONS
0
(Zero). Checkpoints multiple jobs. Checkpoints all the jobs that satisfy other specified
options (
-m, -q, -u and -J).
-f
Forces a job to be checkpointed even if non-checkpointable conditions exist (these
conditions are OS-specific).
-k
Kills a job after it has been successfully checkpointed.
-p minutes | -p 0
Enables periodic checkpointing and specifies the checkpoint period, or modifies the
checkpoint period of a checkpointed job. Specify
-p 0 (zero) to disable periodic
checkpointing.
Checkpointing is a resource-intensive operation. To allow your job to make progress
while still providing fault tolerance, specify a checkpoint period of 30 minutes or longer.
-J job_name
Only checkpoints jobs that have the specified job name.