LSF Version 7.3 - Administering Platform LSF
Time-based Slot Reservation
416 Administering Platform LSF
Reservation scenarios
Scenario 1 Even though no running jobs finish and no host status in cluster are changed, a job’s
future allocation may still change from time to time.
Why this happens Each scheduling cycle, the scheduler recalculates a job’s reservation information,
estimated start time and opportunity for future allocation. The job candidate host
list may be reordered according to current load. This reordered candidate host list
will be used for the entire scheduling cycle, also including job future allocation
calculation. So different order of candidate hosts may lead to different result of job
future allocation. However, the job estimated start time should be the same.
For example, there are two hosts in cluster,
hostA and hostB. 4 CPUs per host. Job
1 is running and occupying 2 CPUs on
hostA and 2 CPUs on hostB. Job 2 requests
6 CPUs. If the order of hosts is
hostA and hostB, then the future allocation of job 2
will be 4 CPUs on
hostA 2 CPUs on hostB. If the order of hosts changes in the next
scheduling cycle changes to
hostB and hostA, then the future allocation of job 2
will be 4 CPUs on
hostB 2 CPUs on hostA.
Scenario 2: If you set JOB_ACCEPT_INTERVAL to non-zero value, after job is dispatched,
within JOB_ACCEPT_INTERVAL period, pending job estimated start time and
future allocation may momentarily fluctuate.
Why this happens The scheduler does a time-based reservation calculation each cycle. If
JOB_ACCEPT_INTERVAL is set to non-zero value. once a new job has been
dispatched to a host, this host will not accept new job within
JOB_ACCEPT_INTERVAL interval. Because the host will not be considered for
the entire scheduling cycle, no time-based reservation calculation is done, which
may result in slight change in job estimated start time and future allocation
information. After JOB_ACCEPT_INTERVAL has passed, host will become
available for time-based reservation calculation again, and the pending job
estimated start time and future allocation will be accurate again.
Examples
Example 1 Three hosts, 4 CPUs each: qat24, qat25, and qat26. Job 11895 uses 4 slots on qat24
(10 hours). Job 11896 uses 4 slots on
qat25 (12 hours), and job 11897 uses 2 slots
on
qat26 (9 hours).
Job 11898 is submitted and requests
-n 6 -R "span[ptile=2]".
bjobs -l 11898
Job <11898>, User <user2>, Project <default>, Status <PEND>, Queue <challenge>,
Job Priority <50>, Command <sleep 100000000>
..
RUNLIMIT
840.0 min of hostA
Fri Apr 22 15:18:56: Reserved <2> job slots on host(s) <2*qat26>;
Sat Apr 23 03:28:46: Estimated Job Start Time;
alloc=2*qat25 2*qat24 2*qat26.lsf.platform.com
Example 2 Two RMS hosts, sierraA and sierraB, 8 CPUs per host. Job 3873 uses 4*sierra0
and will last for 10 hours. Job 3874 uses 4*sierra1 and will run for 12 hours. Job 3875
uses 2*sierra2 and 2*sierra3, and will run for 13 hours.