Platform LSF Reference Version 6.2

bmig
Platform LSF Reference
99
bmig
migrates checkpointable or rerunnable jobs
SYNOPSIS
bmig [-f] [job_ID | "job_ID[index_list]"] ...
bmig [-f] [-J job_name] [-m "host_name ..." | -m "host_group ..."] [-u user_name | -
u
user_group | -u all] [0]
bmig [-h | -V]
DESCRIPTION
Migrates one or more of your checkpointable and rerunnable jobs. LSF administrators
and
root can migrate jobs submitted by other users.
By default, migrates one job, the most recently submitted job, or the most recently
submitted job that also satisfies other specified options (
-u and -J). Specify 0 (zero) to
migrate multiple jobs.
To migrate a job, both hosts must be binary compatible, run the same OS version, have
access to the executable, have access to all open files (LSF must locate them with an
absolute path name), and have access to the checkpoint directory.
Only started jobs can be migrated (i.e., running or suspended jobs); pending jobs cannot
be migrated.
Members of a chunk job can be migrated. Chunk jobs in WAIT state are removed from
the job chunk and put into PEND state.
When a checkpointable job is migrated, LSF checkpoints and kills the job (similar to the
-k option of bchkpnt(1)) then restarts it on the next available host. If checkpoint is
not successful, the job is not killed and remains on the host. If a job is being
checkpointed when
bmig is issued, the migration is ignored. This situation may occur if
periodic checkpointing is enabled.
With the MultiCluster job forwarding model, you can only operate on a MultiCluster job
from the execution cluster, and the job will be restarted on the same host. To move the
job to a different host, use
brun. Use brun -b if another host might not have access
to the checkpoint directory.
When a rerunnable job is migrated, LSF kills the job (similar to
bkill(1)) then restarts
it from the beginning on the next available host.
The environment variable LSB_RESTART is set to Y when a migrating job is restarted
or rerun.
A job is made rerunnable by specifying the
-r option on the command line using
bsub(1) and bmod(1), or automatically by configuring RERUNNABLE in
lsb.queues(5).
A job is made checkpointable by specifying the location of a checkpoint directory on the
command line using the
-k option of bsub(1) and bmod(1), or automatically by
configuring CHKPNT in
lsb.queues(5).