HP XC System Software Administration Guide Version 4.0

Table Of Contents
Service: Root key synchronization
Status Information: Root SSH key synchronization status
This entry provides the status of the root key synchronization.
A warning or critical message indicates that the root ssh keys for one or more hosts are out of
synchronization with the head node. The ssh and pdsh commands may not work for these nodes.
Verify that the imaging is correct on the affected nodes. The most common cause of this problem is
caused by a node that failed to reimage and booted a kernel with an older set of ssh keys (/root/
.ssh/*).
If all the nodes are not synchronized, determine if the head node changed its root ssh keys.
See “Mismatched Secure Shell Keys” (page 248) for more information.
Service: Supermon Metrics Monitor
Status Information: Supermon node metrics retrieval status
This entry reports the status of the Supermon service and the number of nodes from which it collected
metrics data.
A warning or critical message indicates that one or more hosts was not accessible during metrics
collection or there was a Nagios service_check_timeout interval timed out.
These messages can occur if metrics collection cannot be completed in a reasonable time; examine the
/opt/hptc/nagios/etc/nagios.cfg file for the value of the service_check_timeout
parameter.
The default should be adequate for HP XC systems with fewer than 256 nodes.
Increasing the value for the service_check_timeout parameter may solve the problem for systems
with more nodes.
Also, verify that the supermond service is running by invoking the following command on the head
node:
# service supermond status
Loss or time-outs of this service can cause per-node warnings for nodeinfo, load average and
system free space.
A non-timeout warning or critical message simply indicates a number of monitored nodes are not
responding; this is normal if the nodes are down or otherwise disabled.
Service: Syslog Alert Monitor
Status Information: Status of consolidated.log syslog monitoring
Typically, this entry reports the number of new records processed in the /hptc_cluster/adm/
logs/consolidated.log file.
A warning or critical message occurs when there is insufficient time to process a huge volume of
messages before the Nagios service_check_timeout period expires.
Nagios examines the recent incoming consolidated log messages and issues a warning or critical
message if the incoming rate since last interval exceeds a configured number of records. The default
values are 2 for warnings and 20 for critical. See /opt/hptc/nagios/libexec/
check_syslogalerts for details.
No specific action is required unless the service times out. In that case, an excessive number of syslog
messages is collected across the system; this is more than the plug-in can process in the
service_check_timeout period. See the /opt/hptc/nagios/etc/nagios.cfg file for the
value of the service_check_timeout parameter. Running the following command on the node
reporting error solves the problem:
# /opt/hptc/nagios/libexec/check_syslogalerts domain node:nagios_monitor nsca
Otherwise, wait for the nightly log to roll over.
Service: Syslog Alerts
Status Information: Node Syslog alerts information
21.3 Messages Reported by Nagios 253