HP XC System Software Administration Guide Version 3.0

The following general parameters are configured:
The MaxJobCount parameter is based on the number of CPUs in the HP XC system and the number
of preemption queues to be used in LSF to ensure that allocations are available for LSF jobs. The default
value is 2000 jobs.
The MinJobAge is set to a value (1 hour or greater) that provides LSF enough time to obtain job status
information after it finishes. The default value is 300 seconds. A value of zero prevents any job record
purging.
The ReturnToService is set to 1, so that a DOWN node becomes available for use on registration.
The default value is 0, which means that a node will remain in the DOWN state until you explicitly change
its state (even if the slurmd daemon registers and resumes communications).
The lsf partition is required by LSF-HPC to identify the nodes available for its management. The RootOnly
setting ensures that only the superuser (root) can request use of these nodes; LSF-HPC daemons are run by
root. The Shared=FORCE setting enables LSF-HPC to dispatch more than one job to a node in this partition
and to facilitate preemption and the efficient use of the resources for serial (single processor) jobs.
The spconfig command works in conjunction with the cluster_config utility to configure SLURM for
HP XC. The spconfig command is run after all the nodes in the cluster are up and running. The spconfig
command performs three main functions:
It configures an elanhosts configuration file for use by SLURM ELAN support for systems with a
Quadrics interconnect.
It accurately configures the Procs and RealMemory settings in the slurm.conf file for all the compute
nodes. This data is not known until the compute nodes are booted and are registered with the HP XC
database.
It restarts SLURM across the cluster.
Although a number of options are available, after the cluster_config utility and the spconfig command
execute, the slurm.conf file is generally set up to perform optimally on an HP XC system. However, you
might want to change the node characteristics or the assignment of nodes to partitions to suit your site needs.
The following sections describe some of these common configuration changes to the slurm.conf file.
Configuring SLURM System Interconnect Support
SLURM has system interconnect support for Quadrics ELAN, which assists MPI jobs with the global exchange
process during startup, when each process is establishing the communication channels with the other processes
in the job.
The SwitchType SLURM configuration setting is set during cluster_config and cannot be adjusted by
the installer (except manually). The cluster_config process queries the HP XC cmdb for the HP XC system
interconnect type, and if it is Quadrics Elan, the SwitchType is set to switch/elan. Otherwise, it is set
to switch/none. This setting enables or disables SLURM support for Quadrics Elan.
If the SwitchType setting is adjusted manually, you will need to restart SLURM:
# cexec -a service slurm restart
Configuring SLURM Servers
The ControlMachine and BackupController settings are configured during the cluster_config
utility; these settings are the host name of the primary controller and backup controllers, respectively. The
installer chooses from among the nodes with the resource management role the node to run the master
slurmctld daemon and (if there is more than one node with the resource management role) the node to
run the backup slurmctld daemon.
Be sure to shut down SLURM on the HP XC system before adjusting these settings manually.
See the
HP XC System Software Installation Guide
for information about changing the choice of primary
and backup nodes for SLURM by using the cluster_config utility.
Configuring SLURM 103