HP XC System Software Administration Guide Version 3.1

Table 14-1 SLURM Configuration Settings (continued)
Default Value*Setting
switch/elan for systems with the Quadrics interconnect
switch/none for systems with any other interconnect
SwitchType
* Default values can be adjusted during installation.
You can also use the scontrol show config command to examine the current SLURM configuration.
The following general parameters are configured:
The MaxJobCount parameter is based on the number of CPUs in the HP XC system and the number
of preemption queues to be used in LSF to ensure that allocations are available for LSF jobs. The
default value is 2000 jobs.
The MinJobAge is set to a value (1 hour or greater) that provides LSF enough time to obtain job
status information after it finishes. The default value is 300 seconds. A value of zero prevents any job
record purging.
The ReturnToService is set to 1, so that a DOWN node becomes available for use on registration.
The default value is 0, which means that a node will remain in the DOWN state until you explicitly
change its state (even if the slurmd daemon registers and resumes communications).
The lsf partition is required by LSF-HPC with SLURM to identify the nodes available for its management.
The RootOnly setting ensures that only the superuser (root) can request use of these nodes; LSF-HPC
with SLURM daemons are run by root. The Shared=FORCE setting enables LSF-HPC with SLURM to
dispatch more than one job to a node in this partition and to facilitate preemption and the efficient use of
the resources for serial (single processor) jobs.
The spconfig command works in conjunction with the cluster_config utility to configure SLURM
for HP XC. The spconfig command is run after all the nodes in the HP XC system are up and running.
The spconfig command performs three main functions:
It configures an elanhosts configuration file for use by SLURM ELAN support for systems with a
Quadrics interconnect.
It accurately configures the Procs and RealMemory settings in the slurm.conf file for all the
compute nodes. This data is not known until the compute nodes are booted and are registered with
the HP XC database.
It restarts SLURM across the HP XC system.
Although a number of options are available, after the cluster_config utility and the spconfig
command execute, the slurm.conf file is generally set up to perform optimally on an HP XC system.
However, you might want to change the node characteristics or the assignment of nodes to partitions to
suit your site needs. The following sections describe some of these common configuration changes to the
slurm.conf file.
14.2.1 Configuring SLURM System Interconnect Support
SLURM has system interconnect support for Quadrics ELAN, which assists MPI jobs with the global
exchange process during startup, when each process is establishing the communication channels with the
other processes in the job.
The SwitchType SLURM configuration setting is set during cluster_config and cannot be adjusted
by the installer (except manually). The cluster_config process queries the CMDB for the HP XC system
interconnect type, and if it is Quadrics Elan, the SwitchType is set to switch/elan. Otherwise, it is set
to switch/none. This setting enables or disables SLURM support for Quadrics Elan.
If the SwitchType setting is adjusted manually, you will need to restart SLURM:
# cexec -a service slurm restart
14.2.2 Configuring SLURM Servers
The ControlMachine and BackupController settings are configured during the cluster_config
utility; these settings are the host name of the primary controller and backup controllers, respectively.
The installer chooses from among the nodes with the resource management role the node to run the master
14.2 Configuring SLURM 159