LSF Version 7.3 - Using Platform LSF HPC

Tips for Writing PJL Wrapper Scripts
A wrapper script is often used to call the PJL. We assume the PJL is not integrated with
LSF, so if PAM was to start the PJL directly, the PJL would not automatically use the
hosts that LSF selected, or allow LSF to collect resource information.
The wrapper script can set up the environment before starting the actual job.
The script should create and use its own log file, for troubleshooting purposes. For
example, it should log a message each time it runs a command, and it should also log the
result of the command. The first entry might record the successful creation of the log
file itself.
Set up aliases for the commands used in the script, and identify the full path to the
command. Use the alias throughout the script, instead of calling the command directly.
This makes it simple to change the path or the command at a later time, by editing just
one line.
If the script is interrupted or terminated before it finishes, it should exit gracefully and
undo any work it started. This might include closing files it was using, removing files it
created, shutting down daemons it started, and recording the signal event in the log file
for troubleshooting purposes.
In LSF, job requeue is an optional feature that depends on the job’s exit value. PAM exits
with the same exit value as PJL, or its wrapper script. Some or all errors in the script can
specify a special exit value that causes LSF to requeue the job.
Use /dev/null to redirect any screen output to a null file.
Set LSF_ENVDIR and source the lsf.conf file. This gives you access to LSF
configuration settings.
The hosts LSF has selected to run the job are described by the environment variable
LSB_MCPU_HOSTS. This environment variable specifies a list, in quotes, consisting of
one or more host names paired with the number of processors to use on that host:
host_name number_processors host_name number_processors ...”
Parse this variable into the components and create a host file in the specific format
required by the vendor PJL. In this way, the hosts LSF has chosen are passed to the PJL.
Depending on the vendor, the PJL may require some special pre-execution work, such
as initializing environment variables or starting daemons. You should log each pre-exec
task in the log file, and also check the result and handle errors if a required task failed.
If an external resource is used to identify MPI-enabled hosts, LSF has selected hosts
based on the availability of that resource. However, there is some time delay between
LSF scheduling the job and the script starting the PJL. It’s a good idea to make the script
verify that required resources are still available on the selected hosts (and exit if the hosts
are no longer able to execute the parallel job). Do this immediately before starting the
PJL.
The most important function of the wrapper script is to start the PJL and have it execute
the parallel job on the hosts selected by LSF. Normally, you use a version of the
mpirun
command.