LSF Version 7.3 - Using Platform LSF HPC
PAM updates resource usage for each task for every SBD_SLEEP_TIME +
num_tasks * 1 seconds (by default, SBD_SLEEP_TIME=15). For large parallel
jobs, this interval is too long. As the number of parallel tasks increases,
LSF_PAM_RUSAGE_UPD_FACTOR causes more frequent updates.
Default: LSF_PAM_RUSAGE_UPD_FACTOR=0.01For large clusters
Run badmin ckconfig to check the configuration changes.
If any errors are reported, fix the problem and check the configuration again.
Reconfigure the cluster:
badmin reconfig
Checking configuration files ...
No errors found.
Do you want to reconfigure? [y/n] y
Reconfiguration initiated
LSF checks for any configuration errors. If no fatal errors are found, you are asked
to confirm reconfiguration. If fatal errors are found, reconfiguration is aborted.
POE ELIM (elim.hpc)
An external LIM (ELIM) for POE jobs is supplied with LSF.
On IBM HPS systems, ELIM uses the
st_status or ntbl_status command to
collect information from the Resource Manager.
The ELIM searches the following path for the poe and st_status commands:
PATH="/usr/bin:/bin:/usr/local/bin:/local/bin:/sbin:/usr/sbin:/usr/ucb:/usr/sbi
n:
/usr/bsd:${PATH}"
If these commands are installed in a different directory, you must modify the PATH
variable in
LSF_SERVERDIR/elim.hpc to point to the correct directory.
POE esub (esub.poe)
The esub for POE jobs, esub.poe, is installed by lsfinstall. It is invoked using
the
-a poe option of bsub. By default, the POE esub sets the environment variable
LSF_PJL_TYPE=poe. The job launcher,
mpirun.lsf reads the environment variable
LSF_PJL_TYPE=poe, and generates the appropriate
pam command line to invoke
POE to start the job.
The value of the bsub -n option overrides the POE -procs option. If no -n is used,
the
esub sets default values with the variables LSB_SUB_NUM_PROCESSORS=1
and LSB_SUB_MAX_NUM_PROCESSORS=1.
If you specify -euilib us (US mode), then -euidevice must be css0 or csss (the
HPS for interprocess communications.)
The
-euidevice sn_all option is supported. The -euidevice sn_single
option is ignored. POE jobs submitted with
-euidevice sn_single use
-euidevice sn_all.