LSF Version 7.3 - Platform LSF Configuration Reference
Important:
The erestart.application executable must:
•
Have access to the command line used to submit or modify
the job
•
Exit with a return value without running an application; the
erestart interface runs the application to restart the job
Executable
file
UNIX naming convention Windows naming convention
echkpnt LSF_SERVERDIR/echkpnt.application LSF_SERVERDIR\echkpnt.application.exe
LSF_SERVERDIR\echkpnt.application.bat
erestart LSF_SERVERDIR/erestart.application LSF_SERVERDIR
\erestart.application.exe
LSF_SERVERDIR
\erestart.application.bat
Restriction:
The names echkpnt.default and erestart.default are
reserved. Do not use these names for application-level
checkpoint and restart executables.
Valid file names contain only alphanumeric characters,
underscores (_), and hyphens (-).
For application-level checkpoint and restart, once the LSF_SERVERDIR contains one or more
checkpoint and restart executables, users can specify the external checkpoint executable
associated with each checkpointable job they submit. At restart, LSF invokes the corresponding
external restart executable.
Requirements for application-level checkpoint and restart
executables
•
The executables must be written in C or Fortran.
•
The directory/name combinations must be unique within the cluster. For example, you
can write two different checkpoint executables with the name echkpnt.fluent and save
them as LSF_SERVERDIR/echkpnt.fluent and my_execs/echkpnt.fluent. To
run checkpoint and restart executables from a directory other than LSF_SERVERDIR, you
must configure the parameter LSB_ECHKPNT_METHOD_DIR in lsf.conf.
•
Your executables must return the following values.
•
An echkpnt.application must return a value of 0 when checkpointing succeeds and a
non-zero value when checkpointing fails.
•
The erestart interface provided with LSF restarts the job using a restart command
that erestart.application writes to a file. The return value indicates whether
erestart.application successfully writes the parameter definition
LSB_RESTART_CMD=restart_command to the file checkpoint_dir/
job_ID/.restart_cmd.
Feature: Job checkpoint and restart
Platform LSF Configuration Reference 95