LSF Version 7.3 - Administering Platform LSF

Administering Platform LSF 115
Managing Jobs
Run windows during which jobs from the queue can run
Limits on the number of job slots configured for a queue, a host, or a user
Relative priority to other users and jobs
Availability of the specified resources
Job dependency and pre-execution conditions
Maximum pending
job threshold
If the user or user group submitting the job has reached the pending job threshold
as specified by
MAX_PEND_JOBS (either in the User section of lsb.users, or
cluster-wide in
lsb.params), LSF will reject any further job submission requests
sent by that user or user group. The system will continue to send the job submission
requests with the interval specified by
SUB_TRY_INTERVAL in lsb.params until it
has made a number of attempts equal to the
LSB_NTRIES environment variable. If
LSB_NTRIES is undefined and LSF rejects the job submission request, the system
will continue to send the job submission requests indefinitely as the default
behavior.
Suspended jobs
A job can be suspended at any time. A job can be suspended by its owner, by the
LSF administrator, by the root user (superuser), or by LSF.
After a job has been dispatched and started on a host, it can be suspended by LSF.
When a job is running, LSF periodically checks the load level on the execution host.
If any load index is beyond either its per-host or its per-queue suspending
conditions, the lowest priority batch job on that host is suspended.
If the load on the execution host or hosts becomes too high, batch jobs could be
interfering among themselves or could be interfering with interactive jobs. In either
case, some jobs should be suspended to maximize host performance or to guarantee
interactive response time.
LSF suspends jobs according to the priority of the jobs queue. When a host is busy,
LSF suspends lower priority jobs first unless the scheduling policy associated with
the job dictates otherwise.
Jobs are also suspended by the system if the job queue has a run window and the
current time goes outside the run window.
A system-suspended job can later be resumed by LSF if the load condition on the
execution hosts falls low enough or when the closed run window of the queue opens
again.
WAIT state (chunk jobs)
If you have configured chunk job queues, members of a chunk job that are waiting
to run are displayed as
WAIT by bjobs. Any jobs in WAIT status are included in the
count of pending jobs by
bqueues and busers, even though the entire chunk job
has been dispatched and occupies a job slot. The
bhosts command shows the single
job slot occupied by the entire chunk job in the number of jobs shown in the NJOBS
column.
You can switch (
bswitch) or migrate (bmig) a chunk job member in WAIT state to
another queue.
See Chapter 30, “Chunk Job Dispatch” for more information about chunk jobs.