LSF Version 7.3 - Platform LSF Configuration Reference
If the command line is… And… Then…
bsub -k "my_dir"
In lsb.applications,
CHKPNT_PERIOD=360
•
LSF saves the checkpoint file to my_dir/
job_ID every 360 minutes
bsub -k "240"
In lsb.applications,
CHKPNT_DIR=app_dir
CHKPNT_PERIOD=360
In lsb.queues,
CHKPNT=other_dir
•
LSF saves the checkpoint file to app_dir/
job_ID every 240 minutes
Configuration to modify job checkpoint and restart
There are configuration parameters that modify various aspects of job checkpoint and restart
behavior by:
•
Specifying mandatory application-level checkpoint and restart executables that apply to
all checkpointable batch jobs in the cluster
•
Specifying the directory that contains customized application-level checkpoint and restart
executables
•
Saving standard output and standard error to files in the checkpoint directory
•
Automatically checkpointing jobs before suspending or terminating them
•
For Cray systems only, copying all open job files to the checkpoint directory
Configuration to specify mandatory application-level executables
You can specify mandatory checkpoint and restart executables by defining the parameter
LSB_ECHKPNT_METHOD in lsf.conf or as an environment variable.
Configuration
file
Parameter and syntax Behavior
lsf.conf LSB_ECHKPNT_METHOD="echkpnt_applicat
ion"
•
The specified echkpnt runs for all
batch jobs submitted to the cluster. At
restart, the corresponding erestart
runs.
•
For example, if
LSB_ECHKPNT_METHOD=fluent,
at checkpoint, LSF runs
echkpnt.fluent and at restart, LSF
runs erestart.fluent.
•
If an LSF user specifies a different
echkpnt_application at the job level
using bsub -k or bmod -k, the job
level value overrides the value in
lsf.conf.
Feature: Job checkpoint and restart
98 Platform LSF Configuration Reference