HP XC System Software Administration Guide Version 3.2.1

useful commands to collect and present data in a scalable and intuitive fashion. The Web pages
update automatically at a preconfigured interval (120 seconds by default).
To open the Web page, open a browser on the head node and point it to the following:
https://head_node_fully_qualified_domain_name/resmon.
You are prompted to supply your Nagios user name and password (which were defined during
the initial installation and configuration of the HP XC system).
As shown in Figure 7-6 (page 103), the resmon window has a heading and three major portions:
Heading The top left of the resmon Web page has a link to the Nagios open source
application. See Chapter 8 (page 107) for information on Nagios.
The top right of the resmon Web page specifies the last time the web page was
updated, how often the page is updated, and the last load update.
Resources By default, this portion displays the load of each node in the HP XC system. You
can alternate between this setting and the physical memory usage (the difference
of free memory and total memory divided by total memory) by selecting the
corresponding link in this section.
Each small rectangle represents a node in the HP XC system. The nodes are
organized into rows that show the nodes allocated in SLURM partitions and
“Non-SLURM nodes.” The rectangles feature a base and an indicator for either
load or memory usage. You can determine the state of the node by the color
coding of the corresponding rectangle's base. A background color on the rectangle
indicates a job is allocated to the node. Full details are provided in the Key portion
of the web page.
The resmon utility gathers individual node CPU load and memory data from
the metrics monitoring infrastructure. The data is obtained from the HP XC
shownode metrics load and shownode metrics mem commands. The
resmon utility also gathers a CPU count for each node from the cluster
management database (CMDB).
Jobs This portion displays for each job, the Job Identifier (JobID), the user who launched
the job, the job status, the name of the queue for the job, the time the job was
submitted, the time the job actually started, the number of cores for the job, and
the nodes designated for the job.
The resmon utility gathers node state and job information from the resource
management components that have been configured on the HP XC system. These
components are the Load Sharing Facility (LSF) by Platform Computing, the
open source Simple Linux Utility for Resource Management (SLURM) by
Lawrence Livermore National Labs, or both. The LSF bhosts and bjobs
commands and the SLURM scontrol, sinfo, and squeue commands are used
to gather node and job state information.
Key This portion of the resmon web page describes the various symbols used on the
page.
102 Monitoring the System