Platform LSF Administration Guide Version 6.2
External Load Indices and ELIM
Administering Platform LSF
222
External Load Indices and ELIM
The LSF Load Information Manager (LIM) collects built-in load indices that reflect the
load situations of CPU, memory, disk space, I/O, and interactive activities on individual
hosts.
While built-in load indices might be sufficient for most jobs, you might have special
workload or resource dependencies that require custom external load indices defined
and configured by the LSF administrator. Load and shared resource information from
external load indices, are used the same as built in load indices for job scheduling and
host selection.
You can write an External Load Information Manager (ELIM) program that collects the
values of configured external load indices and updates LIM when new values are
received.
An ELIM can be as simple as a small script, or as complicated as a sophisticated C
program. A well-defined protocol allows the ELIM to talk to LIM.
The ELIM executable must be located in LSF_SERVERDIR.
◆
“How LSF supports multiple ELIMs” on page 222
◆
“Configuring your application-specific SELIM” on page 223
◆
“How LSF uses ELIM for external resource collection” on page 223
◆
“Writing an ELIM” on page 224
◆
“Debugging an ELIM” on page 226
How LSF supports multiple ELIMs
To increase LIM reliability, LSF Version 6.2 supports the configuration of multiple
ELIM executables.
Master ELIM
(melim)
A master ELIM (melim) is installed in LSF_SERVERDIR.
melim manages multiple site-defined sub-ELIMs (SELIMs) and reports external load
information to LIM.
melim does the following:
◆
Starts and stops SELIMs
◆
Checks syntax of load information reporting on behalf of LIM
◆
Collects load information reported from SELIMs
◆
Merges latest valid load reports from each SELIM and sends merged load
information back to LIM
ELIM failure
Multiple slave ELIMs managed by a master ELIM increases reliability by protecting
LIM:
◆
ELIM output is buffered
◆
Incorrect resource format or values are checked by ELIM
◆
SELIMs are independent of each other; one SELIM hanging while waiting for load
information does not affect the other SELIMs
Error logging
MELIM logs its own activities and data into the log file
LSF_LOGDIR/melim.log.host_name.