LSF Version 7.3 - Using Platform LSF HPC
See the Platform LSF Command Reference for more information about the blaunch
command.
LSF APIs for the blaunch distributed application framework
LSF provides the following APIs for programming your own applications to use the
blaunch distributed application framework:
◆
lsb_launch()—a synchronous API call to allow source level integration with
vendor MPI implementations. This API will launch the specified command (
argv)
on the remote nodes in parallel. LSF must be installed before integrating your MPI
implementation with
lsb_launch(). The lsb_launch() API requires the full
set of
liblsf.so, libbat.so (or liblsf.a, libbat.a).
◆
lsb_getalloc()—allocates memory for a host list to be used for launching
parallel tasks through
blaunch and the lsb_lanuch() API. It is the
responsibility of the caller to free the host list when it is no longer needed. On
success, the host list will be a list of strings. Before freeing host list, the individual
elements must be freed. An application using the
lsb_getalloc() API is
assumed to be part of an LSF job, and that LSB_MCPU_HOSTS is set in the
environment.
See the Platform LSF API Reference for more information about these APIs.
The blaunch job environment
blaunch determines from the job environment what job it is running under, and what
the allocation for the job is. These can be determined by examining the environment
variables LSB_JOBID, LSB_JOBINDEX, and LSB_MCPU_HOSTS. If any of these
variables do not exist,
blaunch exits with a non-zero value. Similarly, if blaunch is
used to start a task on a host not listed in LSB_MCPU_HOSTS, the command exits with
a non-zero value.
The job submission script contains the
blaunch command in place of rsh or ssh. The
blaunch command does sanity checking of the environment to check for LSB_JOBID
and LSB_MCPU_HOSTS. The
blaunch command contacts the job RES to validate
the information determined from the job environment. When the job RES receives the
validation request from
blaunch, it registers with the root sbatchd to handle signals
for the job.
The job RES periodically requests resource usage for the remote tasks. This message
also acts as a heartbeat for the job. If a resource usage request is not made within a
certain period of time it is assumed the job is gone and that the remote tasks should be
shut down. This timeout is configurable in an application profile in
lsb.applications.
The
blaunch command also honors the parameters LSB_CMD_LOG_MASK,
LSB_DEBUG_CMD, and LSB_CMD_LOGDIR when defined in
lsf.conf or as
environment variables. The environment variables take precedence over the values in
lsf.conf.
To ensure that no other users can run jobs on hosts allocated to tasks launched by
blaunch set LSF_DISABLE_LSRUN=Y in lsf.conf. When
LSF_DISABLE_LSRUN=Y is defined, RES refuses remote connections from
lsrun