LSF Version 7.3 - Platform LSF Configuration Reference
Configuration file Parameter and syntax Behavior
lsb.queues
CHKPNT=chkpnt_dir [chkpnt_period]
•
All jobs submitted to the queue are
checkpointable. LSF writes the checkpoint files,
which contain job state information, to the
checkpoint directory. The checkpoint directory
can contain checkpoint files for multiple jobs.
•
The specified checkpoint directory must
already exist. LSF will not create the
checkpoint directory.
•
The user account that submits the job must
have read and write permissions for the
checkpoint directory.
•
For the job to restart on another execution
host, both the original and new hosts must
have network connectivity to the checkpoint
directory.
•
If the queue administrator specifies a checkpoint
period, in minutes, LSF creates a checkpoint file
every chkpnt_period during job execution.
Note:
There is no default value for
checkpoint period. You must
specify a checkpoint period if you
want to enable periodic
checkpointing.
•
If a user specifies a checkpoint directory and
checkpoint period at the job level with bsub -k,
the job-level values override the queue-level
values.
lsb.application
s
Configuration to enable kernel-level checkpoint and restart
Kernel-level checkpoint and restart is enabled by default. LSF users make a job checkpointable
by either submitting a job using bsub -k and specifying a checkpoint directory or by
submitting a job to a queue that defines a checkpoint directory for the CHKPNT parameter.
Configuration to enable user-level checkpoint and restart
To enable user-level checkpoint and restart, you must link your application object files to the
LSF checkpoint libraries provided in LSF_LIBDIR. You do not have to change any code within
your application. For instructions on how to link application files, see the Platform LSF
Programmer’s Guide.
Configuration to enable application-level checkpoint and restart
Application-level checkpointing requires the presence of at least one echkpnt.application
executable in the directory specified by the parameter LSF_SERVERDIR in lsf.conf. Each
echkpnt.application must have a corresponding erestart.application.
Feature: Job checkpoint and restart
94 Platform LSF Configuration Reference