LSF Version 7.3 - Administering Platform LSF

ManualsBrandsHP ManualsSoftwareHP XC System 4.x Software

361

362

363

364

365

366

367

368

369

370

Administering Platform LSF 365

Goal-Oriented SLA-Driven Scheduling

LSF host partitions are typically used to implement user-based fairshare policies.

Complete the following steps to allow the EGO-enabled SLA to allocate hosts to

specific users.

1 Log on to the LSF master host as the cluster administrator.

2 Log on to the Platform Management Console:

a Define an EGO resource group that contains the selected hosts.

b Define an EGO consumer that is associated with this resource group.

3 Edit

lsb.users and define a user group that configures the fairshare policy.

4 Edit

lsb.serviceclasses and define a service class that uses the consumer

and user group.

5 Run badmin reconfig to reconfigure mbatchd.

With this configuration, when the job is submitted to the defined SLA, the job is

scheduled according to the user-based fairshare policy within the selected hosts.

Advanced configuration

lsb.serviceclasses ◆ EGO_RES_REQ=res_req—EGO resource requirement that specifies the

characteristics of the hosts that EGO will assign to the SLA. Must be a valid

EGO resource requirement. The EGO resource requirement string supports

the

select section, but the format is different from LSF resource requirements.

For example:

EGO_RES_REQ=select(linux && maxmem > 100)

◆ MAX_HOST_IDLE_TIME=seconds—number of seconds that the SLA will

hold on to hosts that have no jobs running before LSF releases them to EGO.

Each SLA can configure a different idle time. The default is 120 seconds. Do not

set this parameter to a small value, or LSF may release hosts too quickly.

lsb.params ◆ DEFAULT_SLA_VELOCITY—the number of slots that the SLA should

request for parallel jobs running in the SLA. By default, an EGO-enabled SLA

requests slots from EGO based on the number of jobs the SLA needs to run, not

the number of slots required by the job. If the jobs themselves require more

than one slot, they will remain pending. To avoid this for parallel jobs, set

DEFAULT_SLA_VELOCITY to the total number of slots that are expected to

be used by parallel jobs.

◆ MBD_EGO_CONNECT_TIMEOUT—timeout parameter for network I/O

with EGO

vemkd. The default is 3 seconds.

◆ MBD_EGO_READ_TIMEOUT—timeout parameter for network I/O with

EGO

vemkd. The default is 3 seconds.

◆ MBD_EGO_TIME2LIVE—how long EGO should keep information about

host allocations in case

mbatchd restarts. The default is 1440 minutes (24

hours).