Installing Standard LSF on a Subset of HP XC Nodes
Introduction
This document provides instructions for installing standard LSF on a subset of nodes in the XC cluster
(in our example a set of large SMP nodes or "fat" nodes) while maintaining LSF-HPC integrated with
SLURM on the rest of the nodes in the XC cluster (in our example the "thin" nodes).
This approach prevents jobs from running across both thin and fat nodes, but does offer full standard
LSF support for these large SMP systems, particularly job scheduling based on the size and load of
memory and/or cpu.
The existing XC cluster_config program allows you to decide which nodes have a "compute"
role and which nodes have a "resource_management" role. The "compute" nodes become SLURM
compute nodes. The "resource_management" nodes are where the SLURM master and backup
daemons reside, and one of them is selected to run the LSF-HPC daemons. Thus the existing
technology in XC allows you to "configure out" a subset of XC nodes that will not run LSF-HPC with
SLURM.
Before running the cluster_config command to adjust the role assignments, you need to install
standard LSF and perform some additional configuration to support standard LSF within XC.
Installing standard LSF on XC is straightforward, it just involves a few extra adjustments in order to
work with the file system management on XC. The following procedures cover all the necessary
adjustments:
Requirements
1. This procedure has the following requirements:
• Standard LSF version must be the latest 6.0 version or later (with schmod_slurm module).
• You must be familiar with LSF-HPC for SLURM installation and configuration on XC.
• You must be familiar with standard LSF installation and administration procedures
• The XC head node cannot be configured to run standard LSF. This is due to an
unresolved issue in the LSF failover and setup mechanism on XC that will be corrected in the next
XC release.
Assumptions
The following assumptions apply to this procedure:
• LSF-HPC for SLURM was installed by the cluster_config process using default values
• You have obtained a proper Platform LSF license.
• There is no desire to communicate with an external LSF cluster (this can be done, but involves
additional procedures to prepare the external network connections).
Sample Case
The example in this HowTo considers an XC cluster of 128 nodes consisting of:
• A head node with a hostname of xc128
• 6 large SMP nodes or "fat" nodes with the hostnames xc[1-6]
1. 122 thin nodes. 114 of the "thin" nodes are compute nodes and have hostnames of xc7-120.