HP XC System Software Installation Guide Version 2.1
Initially, all no des with the compute role are listed as SLURM compute nodes, and these
nodes are configured statically with a processor count of two.
A sample default compute node entry in the slurm.conf file looks lik e thi s:
NodeName=n[1-64] Procs=2
SLURM provides the ability to set several other compute node characteristics. At a
minimum, you should ensure that the processor count is accurate. You should also set the
RealMemory and TmpDisk characteristics so that SLURM can monitor those values on
the nodes and users can submit jobs that request specific values.
A sample updated entry m ight look like this:
NodeName=n[1-59] Procs=2 RealMemory=2048 TmpDisk=9036
NodeName=n[60-64] Procs=4 RealMemory=4096 TmpDisk=16384 Weight=2
See the slurm.conf ma np age for more information about setting compute node
characteristics.
• Compute node partition layout
Initially, all nodes are placed into one part ition for exclusive management by LSF HPC.
If you would like to set aside some n odes for non-LSF use, you must configure a s econd
partition for tho s e nodes.
A sample defau lt par tition entry in the slurm.conf file looks l ike this:
PartitionName=lsf RootOnly=YES Shared=FORCE Nodes=n[1-64]
An updated partition configuration to create a second partition for direct SLUR M use by
users might look like this:
PartitionName=lsf RootOnly=YES Shared=FORCE Nodes=n[10-64]
PartitionName=srun Default=YES Nodes=n[1-9]
See t he slurm.conf man page for m or e information on configuring parti tions.
If you make any manual changes to the slurm.conf file, restart SLU RM on the head node:
# service slurm restart
Yo u m ig ht see service startup error messages if the resource_management or compute
roles are not assigned to the head node. These errors can be safely ignored.
If you need additional information about SLURM, a Reference Manual is available at the
following UR L:
http://www.llnl.gov/LCdocs/slurm/
Proceed to Section 4.10 to start the system and propagate the golden image to all client nodes.
4.10 Start the System and Propagate the Golden Image
The first time the entire system is started wit h the startsys command, power to each node
is turned on, and each node boots from its network adapter and automatically downloads the
SystemImager automatic installatio n environment. This environment automatically installs and
configures each node from the golden image.
The number of nodes to be installed influences the a m ount of tim e it tak es to complete the
process. After all nodes are installed, they automatically reboot to the login prompt. Th is
process can take up to 45 minutes on a system with 64 nodes and approximately 2 hours on
a system w ith 128 nodes.
Response time is affected by the node count and th e amount of m emory installed in the head
node. High node counts and a relatively low amounts of head node memory might cause the
system to appear non-responsive. Table 4-4 describes some configurations that might be pron e
to this problem.
4-22 Con figuring and Imaging the System