Platform LSF Administration Guide Version 6.2

Chapter 25
Job Checkpoint, Restart, and Migration
Administering Platform LSF
407
Automatically Migrating Jobs
Automatic job migration works on the premise that if a job is suspended (SSUSP) for an
extended period of time, due to load conditions or any other reason, the execution host
is heavily loaded. To allow the job to make progress and to reduce the load on the host,
a migration threshold is configured. LSF allows migration thresholds to be configured
for queues and hosts. The threshold is specified in minutes.
When configured on a queue, the threshold will apply to all jobs submitted to the queue.
When defined at the host level, the threshold will apply to all jobs running on the host.
When a migration threshold is configured on both a queue and host, the lower threshold
value is used. If the migration threshold is configured to 0 (zero), the job will be
migrated immediately upon suspension (SSUSP).
You can use
bmig at anytime to override a configured threshold.
Configuring
queue migration
threshold
To configure a migration threshold for a queue, edit lsb.queues and specify a
threshold for the MIG parameter. For example, to configure a queue to migrate
suspended jobs after 30 minutes:
Begin Queue
...
MIG=30
# Migration threshold set to 30 mins
DESCRIPTION=Migrate suspended jobs after 30 mins
...
End Queue
Configuring host
migration
threshold
To configure a migration threshold for a host, edit lsb.hosts and specify a threshold
for the MIG parameter for a host. For example, to configure a host to migrate
suspended jobs after 30 minutes:
Begin Host
HOST_NAME r1m pg
MIG
# Keywords
...
hostA 5.0 18
30
...
End Host
Requeuing migrating jobs
By default, LSF restarts or reruns a migrating job on the next available host, bypassing
all pending jobs.
You can configure LSF to requeue migrating jobs rather than immediately restarting
them. Jobs will be requeued in PEND state and ordered according to their original
submission time and priority. To requeue migrating jobs, edit
lsf.conf and set
LSB_MIG2PEND=1.
Additionally, you can configure LSF to requeue migrating jobs to the bottom of the
queue by editing
lsf.conf and setting LSB_MIG2PEND=1 and
LSB_REQUEUE_TO_BOTTOM=1.