LSF Version 7.3 - Using Platform LSF HPC

Running IBM POE Jobs
The IBM Parallel Operating Environment (POE) interfaces with the Resource Manager
to allow users to run parallel jobs requiring dedicated access to the high performance
switch.
The LSF integration for IBM High-Performance Switch (HPS) systems provides
support for submitting POE jobs from AIX hosts to run on IBM HPS hosts.
An IBM HPS system consists of multiple nodes running AIX. The system can be
configured with a high-performance switch to allow high bandwidth and low latency
communication between the nodes. The allocation of the switch to jobs as well as the
division of nodes into pools is controlled by the HPS Resource Manager.
hpc_ibm queue for POE jobs
During installation, lsfinstall configures a queue in lsb.queues named
hpc_ibm for running POE jobs. It defines requeue exit values to enable requeuing of
POE jobs if some users submit jobs requiring exclusive access to the node.
The
poejob script will exit with 133 if it is necessary to requeue the job. Other types of
jobs should not be submitted to the same queue. Otherwise, they will get requeued if
they happen to exit with 133.
Begin Queue
QUEUE_NAME = hpc_ibm
PRIORITY = 30
NICE = 20
...
RES_REQ = select[ poe > 0 ]
REQUEUE_EXIT_VALUES = 133 134 135
...
DESCRIPTION = Platform LSF HPC 7 for IBM. This queue is to run POE jobs ONLY.
End Queue
Configuring LSF to run POE jobs
Ensure that the HPS node names are the same as their host names. That is, st_status
should return the same names for the nodes that
lsload returns.
1. Configure per-slot resource reservation (lsb.resources)”.
2. Optional. Enable exclusive mode (lsb.queues)”.
3. Optional. Define resource management pools (rmpool) and node locking queue
threshold”.
4. Optional. Define system partitions (spname)”.
5. Allocate switch adapter specific resources”.
6. Optional. Tune PAM parameters”.
7. Reconfigure to apply the changes”.
To support the IBM HPS architecture, LSF must reserve resources based on job slots.
During installation,
lsfinstall configures the ReservationUsage section in
lsb.resources to reserve HPS resources on a per-slot basis.