LSF Version 7.3 - Administering Platform LSF

EGO-enabled SLA scheduling
366 Administering Platform LSF
SLA_TIMER—controls how often each service class is evaluated and a network
message is sent to EGO communicating host demand. The default is 10
seconds.
LSF MXJ and EGO
slots
In LSF, you configure the maximum number of job slots (MXJ) in lsb.hosts. By
default, the MXJ equals to the number of processors on the host. LSF schedules jobs
on that host based on the MXJ.
By default, when EGO-enabled SLA scheduling is configured, EGO allocates an
entire host to LSF, which uses its own MXJ definition to determine how many slots
are available on the host. LSF gets its host allocation from EGO, and runs as many
jobs as the LSF configured MXJ for that host dictates.
To allow partial sharing of hosts (for example, a large SMP computer) among
different consumers or workload managers, use MBD_USE_EGO_MXJ in
lsb.params. This forces LSF to use the job slot maximum configured in the EGO
consumer. When MBD_USE_EGO_MXJ is set, LSF schedules jobs based on the
number of slots allocated from EGO.
When MBD_USE_EGO_MXJ is set, the number of maximum jobs is set to the
number of slots EGO allocates to LSF. For example, if
hostA has 4 processors, but
EGO allocates 2 slots to an EGO-enabled SLA consumer. LSF can schedule a
maximum of 2 jobs from that SLA on
hostA.
LSF cannot release idle slots until while jobs are running on the host. For example,
if EGO allocates 3 slots on a host, and one job finishes before the other two, the idle
slot is not available to other SLAs until all jobs finish on the host.
Limitations of EGO-enabled SLA scheduling
Parallel jobs Resource allocation is based on the number of jobs, not the slots required by the job.
EGO-enabled SLA requests resource based on velocity and the number of pending
jobs. If a parallel job requires multiple processors, the SLA may request fewer
processors than the requirement, which causes the job to remain pending. To avoid
this, you can configure larger velocity in the SLA.
Multicluster Resource export under the lease model is not guaranteed. With EGO-enabled SLA
scheduling, all resources are dynamic, so the exported hosts may not be allocated
to LSF.
Advance reservations EGO-enabled SLA does not support advance reservations. Advanced reservations
need to reserve resources for a specified time window, which is not currently
supported in EGO.
Job-level resource
requirements
(bsub -R)
LSF takes the resource requirement into consideration for scheduling, but if the
resource request does not match the resource requirement specified in the service
class, the host allocated by EGO cannot match the specified resource requirement,
and the job remains pending. LSF treats the allocated host as idle and returns it to
EGO. The pending job causes another request to be sent to EGO, which allocates
another host, which may or may not satisfy the resource requirement.
Use EGO_RES_REQ=res_req in the service class configuration to specify all job
resource requirements.