LSF Version 7.3 - Administering Platform LSF

ManualsBrandsHP ManualsSoftwareHP XC System 4.x Software

361

362

363

364

365

366

367

368

369

370

EGO-enabled SLA scheduling

366 Administering Platform LSF

◆ SLA_TIMER—controls how often each service class is evaluated and a network

message is sent to EGO communicating host demand. The default is 10

seconds.

LSF MXJ and EGO

slots

In LSF, you configure the maximum number of job slots (MXJ) in lsb.hosts. By

default, the MXJ equals to the number of processors on the host. LSF schedules jobs

on that host based on the MXJ.

By default, when EGO-enabled SLA scheduling is configured, EGO allocates an

entire host to LSF, which uses its own MXJ definition to determine how many slots

are available on the host. LSF gets its host allocation from EGO, and runs as many

jobs as the LSF configured MXJ for that host dictates.

To allow partial sharing of hosts (for example, a large SMP computer) among

different consumers or workload managers, use MBD_USE_EGO_MXJ in

lsb.params. This forces LSF to use the job slot maximum configured in the EGO

consumer. When MBD_USE_EGO_MXJ is set, LSF schedules jobs based on the

number of slots allocated from EGO.

When MBD_USE_EGO_MXJ is set, the number of maximum jobs is set to the

number of slots EGO allocates to LSF. For example, if

hostA has 4 processors, but

EGO allocates 2 slots to an EGO-enabled SLA consumer. LSF can schedule a

maximum of 2 jobs from that SLA on

hostA.

LSF cannot release idle slots until while jobs are running on the host. For example,

if EGO allocates 3 slots on a host, and one job finishes before the other two, the idle

slot is not available to other SLAs until all jobs finish on the host.

Limitations of EGO-enabled SLA scheduling

Parallel jobs Resource allocation is based on the number of jobs, not the slots required by the job.

EGO-enabled SLA requests resource based on velocity and the number of pending

jobs. If a parallel job requires multiple processors, the SLA may request fewer

processors than the requirement, which causes the job to remain pending. To avoid

this, you can configure larger velocity in the SLA.

Multicluster Resource export under the lease model is not guaranteed. With EGO-enabled SLA

scheduling, all resources are dynamic, so the exported hosts may not be allocated

to LSF.

Advance reservations EGO-enabled SLA does not support advance reservations. Advanced reservations

need to reserve resources for a specified time window, which is not currently

supported in EGO.

Job-level resource

requirements

(bsub -R)

LSF takes the resource requirement into consideration for scheduling, but if the

resource request does not match the resource requirement specified in the service

class, the host allocated by EGO cannot match the specified resource requirement,

and the job remains pending. LSF treats the allocated host as idle and returns it to

EGO. The pending job causes another request to be sent to EGO, which allocates

another host, which may or may not satisfy the resource requirement.

Use EGO_RES_REQ=res_req in the service class configuration to specify all job

resource requirements.