HP XC System Software Administration Guide Version 3.1
StaggerSlotSize Generally, the increment of time a process pauses before sending its
message. For n tasks, an equal number of staggered time slots are
defined in increments of (StaggerSlotSize * 0.001) seconds. The
first task sends its message immediately; the second task pauses one
increment before sending its message; the third task pauses two
increments before sending its message; and so on. The default value
of this parameter is 1.
If you change the values of any of these parameters, assign them in a comma-separated horizontal
list in quotation marks, as shown here:
JobAcctParameters="Frequency=10,MaxSendRetries=5,StaggerSlotSize=2"
f. Verify that this portion of the slurm.conf file resembles the following (the changes are shown
in bold):
.
.
.
#
# o Define the job accounting mechanism
#
JobAcctType=jobacct/log
#
# o Define the location where job accounting logs are to
# be written. For
# - jobacct/none - this parameter is ignored
# - jobacct/log - the fully-qualified file name
# for the data file
#
JobAcctLoc=/hptc_cluster/slurm/job/jobacct.log
JobAcctParameters="Frequency=10"
.
.
.
g. Save the file.
5. Restart the slurmctld and slurmd daemons:
# cexec -a "service slurm restart"
14.5 Monitoring SLURM
The SLURM squeue, sinfo, and scontrol utilities and the Nagios system monitoring utility provide
the means for monitoring and controlling SLURM on your HP XC system.
For status at a glance, the Nagios system monitor provides a global view of your system and includes
details about the state of SLURM. Chapter 8 (page 101)provides information about Nagios on the HP XC
system.
You can run the scontrol utility to confirm that your control daemons are active. In the following
example, node n5, which runs the primary slurmctld, and node n8, which runs the backup, are both
up.
# scontrol ping
Slurmctld(primary/backup) at n5/n8 are UP/UP
The sinfo command reports the status of both nodes and partitions. Consider this example:
# sinfo --all
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
14.5 Monitoring SLURM 169