HP XC System Software Installation Guide Version 3.2

The sinfo example shown in this section illustrates the Low RealMemory reason. It is more
obscure and can be a side effect of the system configuration process. This error is reported because
the SLURM slurm.conf file is configured with a RealMemory value that is higher than the
MemTotal value in the /proc/meminfo file that is being reported by the compute node. SLURM
does not automatically restore a node that had failed at any point because of this reason.
Assuming that the memory hardware is functioning, follow this procedure to resolve the problem:
1. Ensure that the database has the correct total memory value for the affected node. In this
example, n15 is the affected node.
# pdsh -w n15 /opt/hptc/etc/nconfig.d/C50gather_data
2. Configure SLURM with the correct memory value for this node:
# spconfig
Configured nodes n[1-13] with 2 CPUs and 3017 MB of total memory...
Configured node n14 with 4 CPUs and 3522 MB of total memory...
Configured node n15 with 8 CPUs and 3648 MB of total memory...
Configured node n16 with 4 CPUs and 2008 MB of total memory...
Updating SLURM...
SLURM Post-Configuration Done.
3. Restore the affected node back into operation:
# scontrol update NodeName=n15 State=idle
4. Verify that the LSF partition exists and all nodes are in the idle state:
# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
lsf up infinite 16 idle n[1-16]
12.6.1 SLURM Reconfiguration Errors
This note only applies to systems using a QsNet
II
interconnect.
The last few lines of output of the spconfig command might contain the following error:
Updating SLURM...
slurm_reconfigure error: Slurm backup controller in standby mode
SLURM Post-Configuration Done.
This error is intermittent and benign and is caused by the the spconfig command updating
SLURM with the compute node information too soon after it restarted SLURM to include the
elanhosts information. No corrective action is required.
12.7 Troubleshooting the Software Upgrade Procedure
The following list provides suggestions for troubleshooting problems you might encounter when
upgrading the HP XC System Software from a previous release to this release:
Look at the upgrade log files to determine if there were any upgrade failures. Table 12-2
(page 177) lists the log files that are generated during a software upgrade.
Table 12-2 Software Upgrade Log Files
ContentsFile Name
List of RPMs installed on the system before the upgrade
/opt/hptc/upgrade/rpm_qa_output.log
Results of the preupgradesys script/var/log/preupgradesys/preupgradesys.log
Results of the upgradesys script/var/log/upgradesys/upgradesys.log
A directory that contains files created as a result of
running the upgradesys script
/opt/hptc/etc/sysconfig/upgrade
Results of the upgraderpms utility/var/log/yum_upgrade.log and
12.7 Troubleshooting the Software Upgrade Procedure 177