HP XC System Software Administration Guide Version 3.1
7 Monitoring the System
System monitoring can identify situations before they become problems. This chapter addresses the
following topics:
• “Monitoring Tools” (page 83)
• “Monitoring Strategy” (page 84)
• “Displaying System Environment Data” (page 85)
• “Monitoring Disks” (page 85)
• “Displaying System Statistics” (page 85)
• “Logging Node Events” (page 87)
• “The collectl Utility” (page 89)
• “HP Graph” (page 92)
• “The netdump and crash Utilities” (page 96)
7.1 Monitoring Tools
Tools for monitoring the HP XC System Software include the following:
• Standard Linux monitoring commands:
— ps
— sar
— top
— uptime
— vmstat
— w
You can use these administrative commands from any node to determine the health of an individual
node. Information for these commands is available from their corresponding manpages.
• Utilities developed by HP:
— The collectl utility. See “The collectl Utility” (page 89) for more information.
— The HP XC shownode metrics command, which can be issued from any node in the HP XC
system, provides the ability to monitor the status of all the nodes in the system.
These arguments to the shownode metrics command monitor the node status:
◦ shownode metrics cpus
◦ shownode metrics cputotals
◦ shownode metrics load
◦ shownode metrics mem
◦ shownode metrics paging
◦ shownode metrics sensors
◦ shownode metrics swap
For more information, see “Displaying System Statistics” (page 85) and shownode(8).
• Externally developed software:
— The Nagios Web-based utility displays a series of windows that provide system statistics.
Chapter 8 (page 101) discusses Nagios.
— Supermon is a highly scalable, high-speed cluster monitoring system. Supermon provides all
required node statistics to the Nagios subsystem. System statistics are tiered, aggregated, and
stored in the configuration and management database (CMDB)s.
The Supermon components consist of the kernel modules to collect the statistics, the mond and
supermond daemons, and the script to load and configure the daemons.
The data collected by Supermon includes system performance sensor and environment data,
such as fan, temperature, and power supply status. This data is collected on a regular basis.
— The syslog and syslog-ng Services
7.1 Monitoring Tools 83