HP XC System Software Administration Guide Version 3.1

20 Troubleshooting
This chapter provides information to help you troubleshoot problems with HP XC systems. It addresses
the following topics:
“General Troubleshooting” (page 229)
“Nagios Troubleshooting” (page 229)
“System Interconnect Troubleshooting” (page 235)
“SLURM Troubleshooting” (page 240)
“LSF-HPC Troubleshooting” (page 241)
See also Chapter 19 (page 215) for information on available diagnostic tools that you can use to locate the
source of the failure.
20.1 General Troubleshooting
This section contains general troubleshooting information for HP XC systems.
20.1.1 Mismatched Secure Shell Keys
If a node on your system has a mismatched Secure Shell (ssh) key, review the following list for the source
of the problem:
The node was not imaged, and was booted an old image, which had older ssh keys. In this instance,
it is the image, not the keys, that is out of synchronization.
You can solve this problem by imaging the node properly and rebooting.
The keys were regenerated on the head node. Typically, the cluster_config utlity was run with
the option to regenerate the keys.
If you absolutely must synchronize keys, enter the following command:
# /opt/hptc/nagios/libexec/check_keys --update
This command uses the Nagios nrpe plug-in executor to create a checksum for the keys with the
superuser. If a node has a key is out of synchronization, the node requests and receives a new copy
of the key.
This method offers a means of recovery if the root ssh keys are somehow damaged or corrupted.
20.2 Nagios Troubleshooting
This section contains general troubleshooting information for Nagios application.
NOTE: Nagios runs only nodes with the management_server or management_hub roles.
See “Messages Reported by Nagios” for additional information.
20.2.1 Determining the Status of the Nagios Service
Use the following command to determine if Nagios is running properly:
# pdsh -a "service nagios status"
Nagios ok: located 1 process, status log updated 22 seconds ago
Gathering status for nrpe ... n[3-8]NRPE v2.0 - n[3-8]
Nagios nsca:
n7: 0 data packet(s) sent to host successfully.
n5: 0 data packet(s) sent to host successfully.
n6: 0 data packet(s) sent to host successfully.
20.1 General Troubleshooting 229