LSF Version 7.3 - Platform LSF Configuration Reference
Applicability Details
Dependencies
•
UNIX and Windows user accounts must be valid on all hosts in the cluster, or the
correct type of account mapping must be enabled.
•
For a mixed UNIX/Windows cluster, UNIX/Windows user account mapping
must be enabled.
•
For a cluster with a non-uniform user name space, between-host account
mapping must be enabled.
•
For a MultiCluster environment with a non-uniform user name space, cross-
cluster user account mapping must be enabled.
•
The checkpoint and restart executables run under the user account of the user who
submits the job. User accounts must have the correct permissions to
•
Successfully run executables located in LSF_SERVERDIR or
LSB_ECHKPNT_METHOD_DIR
•
Write to the checkpoint directory
•
The erestart.application executable must have access to the original command
line used to submit the job.
•
For user-level checkpoint and restart, you must have access to your application
object (.o) files.
•
To allow restart of a checkpointed job on a different host than the host on which
the job originally ran, both the original and the new hosts must:
•
Be binary compatible
•
Run the same dot version of the operating system for predictable results
•
Have network connectivity and read/execute permissions to the checkpoint and
restart executables (in LSF_SERVERDIR by default)
•
Have network connectivity and read/write permissions to the checkpoint
directory and the checkpoint file
•
Have access to all files open during job execution so that LSF can locate them
using an absolute path name
Limitations
•
bmod cannot change the echkpnt and erestart executables associated with a
job.
Configuration to enable job checkpoint and restart
The job checkpoint and restart feature requires that a job be made checkpointable at the job
or queue level. LSF users can make jobs checkpointable by submitting jobs using bsub -k
and specifying a checkpoint directory. Queue administrators can make all jobs in a queue
checkpointable by specifying a checkpoint directory for the queue.
Feature: Job checkpoint and restart
Platform LSF Configuration Reference 93