HP XC System Software XC Administration Guide Version 4.
© Copyright 2003, 2004, 2005, 2006, 2007, 2008, 2009 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Table of Contents About This Document.......................................................................................................15 Intended Audience................................................................................................................................15 New and Changed Information in This Edition...................................................................................15 Typographic Conventions.....................................................................
2 Improved Availability...................................................................................................45 2.1 Purpose of the Availability Tool......................................................................................................45 2.2 Services Eligible for Improved Availability....................................................................................45 2.3 Availability Sets...........................................................................................
Managing Licenses......................................................................................................77 5.1 License Manager and License File...................................................................................................77 5.2 Determining If the License Manager Is Running............................................................................77 5.3 Starting and Stopping the License Manager.............................................................................
8 Monitoring the System with Nagios........................................................................105 8.1 Nagios Overview...........................................................................................................................105 8.1.1 Nagios Components..............................................................................................................106 8.1.2 Nagios Hosts....................................................................................................
11.3.1 Installing Additional RPMs from the HP XC System Software Installation DVD..............143 11.3.2 Using File Overrides to the Golden Image..........................................................................143 11.3.3 Using Per-Node Service Configuration...............................................................................146 11.4 Determining Which Nodes Will Be Imaged................................................................................147 11.5 Updating the Golden Image.....
15.2.1 Configuring SLURM System Interconnect Support............................................................172 15.2.2 Configuring SLURM Servers...............................................................................................172 15.2.3 Configuring Nodes in SLURM............................................................................................172 15.2.4 Configuring SLURM Partitions...........................................................................................173 15.2.
17 Managing Modulefiles............................................................................................215 18 Mounting File Systems.............................................................................................217 18.1 Overview of the Network File System on the HP XC System.....................................................217 18.2 Understanding the Global fstab File............................................................................................217 18.
21.2.5 Using the nrg Command's Analyze Mode..........................................................................250 21.3 Messages Reported by Nagios.....................................................................................................251 21.4 System Interconnect Troubleshooting.........................................................................................254 21.4.1 Myrinet System Interconnect Troubleshooting...................................................................
A.9 Sample Running Jobs....................................................................................................................303 A.10 Troubleshooting..........................................................................................................................303 B Setting Up MPICH.....................................................................................................305 B.1 Downloading the MPICH Source Files.............................................................
List of Figures 1-1 1-2 1-3 3-1 7-1 7-2 7-3 7-4 7-5 7-6 8-1 8-2 8-3 8-4 8-5 8-6 8-7 8-8 9-1 18-1 18-2 21-1 A-1 C-1 12 HP XC File System Hierarchy.......................................................................................................26 HP XC Hierarchy Under /opt/hptc................................................................................................29 LVS View of Cluster..............................................................................................................
List of Tables 1-1 1-2 1-3 1-4 3-1 4-1 4-2 4-3 4-4 8-1 8-2 8-3 12-1 12-2 13-1 14-1 15-1 15-2 15-3 15-4 16-1 16-2 16-3 16-4 18-1 22-1 22-2 A-1 Log Files.........................................................................................................................................29 HP XC System Commands............................................................................................................30 HP XC Configuration Files...............................................................
List of Examples 4-1 4-2 7-1 8-1 9-1 15-1 16-1 16-2 16-3 16-4 16-5 16-6 16-7 18-1 18-2 18-3 21-1 A-1 A-2 A-3 A-4 A-5 14 Sample gconfig Script: Client Selection and Client-to-Server Assignment..................................67 Sample service.ini FIle...................................................................................................................72 Using the collectl Utility from the Command Line.......................................................................
About This Document This document describes the procedures and tools that are required to maintain the HP XC system. It provides an overview of the administrative environment and describes administration tasks, node maintenance tasks, Load Sharing Facility (LSF®) administration tasks, and troubleshooting information. An HP XC system is integrated with several open source software components. Some open source software components are being used for underlying technology, and their deployment is transparent.
• There is a change to the updateimage command; the option --gc imageserver designates the golden client. Also, the si_rmimage command is used in conjunction with the updateimage command to create a completely new golden image. • • • • There are various changes with regard to Platform LSF brand, including a change to the LSF service ports in the firewall.
NOTE A note contains additional information to emphasize or supplement important points of the main text. HP XC and Related HP Products Information The HP XC System Software Documentation Set, the Master Firmware List, and HP XC HowTo documents are available at this HP Technical Documentation web address: http://docs.hp.com/en/linuxhpc.
HP Scalable Visualization Array The HP Scalable Visualization Array (SVA) is a scalable visualization solution that is integrated with the HP XC System Software. The SVA documentation is available at the following web address: http://docs.hp.com/en/linuxhpc.html HP Cluster Platform The cluster platform documentation describes site requirements, shows you how to set up the servers and additional devices, and provides procedures to operate and manage the hardware.
— — • HP XC System Software Administration Guide HP XC System Software User's Guide https://computing.llnl.gov/linux/slurm/documentation.html Documentation for the Simple Linux Utility for Resource Management (SLURM), which is integrated with LSF to manage job and compute resources on an HP XC system. • http://www.nagios.org/ Home page for Nagios®, a system and network monitoring application that is integrated into an HP XC system to provide monitoring capabilities.
Linux Web Addresses • http://www.redhat.com Home page for Red Hat®, distributors of Red Hat Enterprise Linux Advanced Server, a Linux distribution with which the HP XC operating environment is compatible. • http://www.linux.org/docs/index.html This web address for the Linux Documentation Project (LDP) contains guides that describe aspects of working with Linux, from creating your own Linux system from scratch to bash script writing.
Software RAID Web Addresses • http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html and http://www.ibiblio.org/ pub/Linux/docs/HOWTO/other-formats/pdf/Software-RAID-HOWTO.pdf A document (in two formats: HTML and PDF) that describes how to use software RAID under a Linux operating system. • http://www.linuxdevcenter.com/pub/a/linux/2002/12/05/RAID.html Provides information about how to use the mdadm RAID management utility.
1 HP XC Administration Environment This chapter introduces the HP XC Administration Environment.
management nodes Perform system management and other functions for the HP XC system. These functions can include the following: • Maintain system software images for installing and updating software on all the nodes in HP XC system • External network communications • Networked file system services • System monitoring • Resource management A management node often hosts multiple management functions. A node is classified based on the service it provides, so it can be known by multiple names.
The login service is assigned to any node in the LVS ring. LVS assigns a user login to one of these nodes whenever that user requests a login to the system through the system name or Internet Protocol (IP) address. A login node requires a connection to the external network. • I/O service An I/O node provides I/O services for the system. The I/O nodes provide access to Storage Area Networks (SANs) connected to the node, access to file systems that are mounted locally to the node, or both.
1.1.3 Roles A role is defined as an aggregate of related services. 1.2 File System Proper maintenance of the file system is crucial to the operation of the HP XC system. This section describes key directories and offers guidelines for the maintenance of the file system. The basic file system layout is the same as that of a standard Linux file system. Additions to the layout for the HP XC system are described in this section.
Note: This directory contains two separate subdirectories, one for HP XC System Software and one for other HP software products. Make individual directories for each vendor or software product. /opt/hp Is reserved for optional HP applications and utilities that apply to the HP XC system. HP-MPI is an example of such a package. /opt/hptc Reserved for the exclusive use of the HP XC System Software. HP XC specific software and associated software packages are maintained under this directory.
This directory is mounted on the head node. You must ensure the persistence of this file system mount. Caution: Do not add, replace, or remove any files in the /hptc_cluster directory. Doing so will cause the HP XC system to fail. 1.2.1.2 HP XC System Software Directory, /opt/hptc The HP XC System Software maintains the /opt/hptc directory for its exclusive use. Its software is installed in that directory.
Figure 1-2 HP XC Hierarchy Under /opt/hptc 1.2.1.3 HP XC Service Configuration Files The /opt/hptc/etc/ directory includes several subdirectories containing scripts used to configure services on nodes at installation time. The /opt/hptc/etc/sconfig.d directory contains scripts for system configuration. The /opt/hptc/etc/gconfig.d directory contains scripts used to gather information needed to configure a service on the HP XC system. The /opt/hptc/etc/nconfig.
Table 1-1 Log Files (continued) Component pathname of Log File Myrinet® gm_drain_test /var/log/diag/myrinet/gm_drain_test/ Myrinet gm_prodmode_mon diagnostic tool /var/log/diag/myrinet/gm_prodmode_mon/links.log ovp /hptc_cluster/adm/logs/ovp/ovp_nodename_mmddyy[rnn] /hptc_cluster/adm/logs/aggregator_nodename.log (alerts) /hptc_cluster/adm/logs/ovp/current_ovp_log (a symbolic link to the most recent log file) powerd /var/log/powerd/powerd.
Table 1-2 HP XC System Commands (continued) Command Description collectl The collectl utility collects data on the nodes of the HP XC system and plays back the information as ASCII text or in a plot form. For more information, see “The collectl Utility” (page 92). Manpage: collectl(1) console The console command enables access to the consoles of all application nodes. Manpage: console(8) controllsf Use the controllsf command to control the execution of LSF with SLURM on the HP XC system.
Table 1-2 HP XC System Commands (continued) Command Description openipport The superuser uses the openipport command to open a specified port in the firewall. Manpage: openipport(8) ovp Use the ovp utility to verify the installation, configuration, and operation of the HP XC system. Manpage: ovp(8) power Use the power command to control the power for a set of nodes and to interrogate their current state.
Table 1-2 HP XC System Commands (continued) Command Description xcxclus The xcxclus and xcxperf X11 clients that enable you to monitor the performance of multiple and individual systems. xcxperf The xcxclus graphic utility that enables you to monitor a number of nodes simultaneously. The xcxperf utility provides a graphic display of node performance for a variety of metrics. These utilities are described in the HPCPI and Xtools User Guide. Manpages: xcxclus(1) and xcxperf(1) 1.3.
Important: Do not pass a command that requires interaction as an argument to the pdsh command. Prompting from the remote node can cause the command to hang. The following example runs the uptime command on all the nodes in a four-node system. # pdsh -a "uptime" n4: 15:51:40 up 2 n3: 15:49:17 up 1 n2: 15:50:32 up 1 n1: 15:47:21 up 1 days, 2:41, 4 users, load average: 0.48, 0.29, 0.11 day, 4:55, 0 users, load average: 0.00, 0.00, 0.00 day, 4:55, 0 users, load average: 0.00, 0.00, 0.
# cexec -a "who --count" n12: root bg rmk n12: # users=3 n25: root wra guest spg n25: # users=4 For additional information, see cexec(1). 1.4 Configuration and Management Database The HP XC system stores information about the nodes and system configuration in the configuration and management database (CMDB). This is a MySQL database that runs on the node with the node management role. The CMDB is constructed during HP XC system installation.
Reconfiguration For a system reconfiguration, the policy in effect is to preserve any customizing you have done to a Linux configuration file unless the change undermines the proper operation of the HP XC System Software. In that case, the HP XC System Software overwrites the configuration file and the changes you made are deleted. Some configuration parameters can be expressed in terms of a range.
Table 1-3 HP XC Configuration Files Component Referenced in Configuration Files collectl utility Chapter 7 (page 85) /opt/hp/collectl/etc/collectl.ini Cluster configuration: Chapter 16 (page 189), Appendix A (page 291) /opt/hptc/config/base_addr.ini Configuration and management database (CMDB) N/A /etc/my.cnf Ethernet port mappings N/A /opt/hptc/config/modelmap Firewall Chapter 12 (page 153) /etc/sysconfig/iptables.proto /etc/ sysconfig/ip6tables.
1.5.3 Configuration Files in Imaged Nodes Client nodes receive their image from the HP XC system's golden master. Unless you either update the golden master or set an override file, the changes made locally to configuration files on the client nodes are lost the next time the node is re-imaged. For more information on the golden master and how to distribute software throughout the HP XC system, see Chapter 11 (page 141). 1.
1.8 Networking The HP XC system relies on networking services to communicate among its nodes. It uses the Linux Virtual Server, Network Time Protocol, Network Address Translation, and Network Information Service. 1.8.1 Linux Virtual Server for HP XC Cluster Alias The HP XC system uses the Linux Virtual Server (LVS) to present a single host name for user logins. LVS is a highly scalable virtual server built on a cluster of real servers.
1.8.4 Network Information Service The configuration of Network Information Service (NIS) on an HP XC system is an optional step that is useful for easing user management and helpful for SLURM and LSF with SLURM use. You may decide to set up your user management with some other software, such as Lightweight Directory Access Protocol (LDAP). The HP XC System Software can be integrated with an external LDAP server for authentication.
Notes: Installing a package in a nondefault location means that you must update the corresponding modulefile; you might need to edit the PATH and MANPATH environment variables. Other changes are based on the software package and its dependencies. If you have installed a variant of the package, you might need to create a parallel modulefile specifically for the variant.
information in an obscured form instead of in the plain text form used by the r* UNIX telnet and ftp commands. For security purposes, use the ssh command instead of these other, much less secure alternatives. To use ssh without password prompting with a user account, you must set up ssh authentication keys. For information on configuring the ssh keys, see “Configuring the ssh Keys for a User” (page 161). 1.11 Recommended Administrative Tasks Table 1-4 lists recommended administrative tasks.
Table 1-4 Recommended Administrative Tasks (continued) When Task Reference After installing additional software installation or changing the system configuration Ensure that the golden image is updated. Chapter 11: “Distributing Software Throughout the System” (page 141) Run the ovp command. Chapter 20: “Using Diagnostic Tools” (page 233) 1.
2 Improved Availability The improved availability feature of the HP XC system offers the following benefits: • • It enables services and, thus, user jobs, to continue to run, even after a node failure. It enables you to run new jobs. The improved availability feature relies on an availability tool controlling nodes and services in an availability set. The HP XC System Software provides commands to transfer control of services to the availability tool.
• • nat for Network Address Translation nagios for the Nagios Master service NOTE: • • The nagios_monitor service is not eligible. kdump for kernel dumps hptc_cluster_fs for the /hptc_cluster file system For more information on services, see Chapter 4 (page 57). 2.3 Availability Sets A set of nodes is designated as an availability set during the configuration of the HP XC System Software. These nodes provide failover and failback functionality.
2.4 HP Serviceguard Tasks This section provides examples of common tasks you perform for HP XC systems that use HP Serviceguard as the availability tool. For more in-depth information on Serviceguard, see the HP Serviceguard documentation at the following web address: http://docs.hp.com/en/ha.html#Serviceguard Each availability set relates to a corresponding Serviceguard cluster. NOTE: In the examples in this section, assume the PATH environment variable has been updated for Serviceguard commands. 2.4.
3. Use the Serviceguard cmviewcl command on one node of each availability set to view the status of the Serviceguard cluster. For example: # pdsh -w n13,n14 /usr/local/cmcluster/bin/cmviewcl n13: n13: CLUSTER STATUS n13: avail2 up n13: n13: NODE STATUS STATE n13: n12 up running n13: n13: PACKAGE STATUS STATE AUTO_RUN n13: n12 up running enabled n13: n13: NODE STATUS STATE n13: n13 up running n13: n13: PACKAGE STATUS STATE AUTO_RUN n13: nat.
2.4.3 Reimaging Nodes in the Availability Set Reimaging a node is the means by which you update software on a node from a central repository, called a golden master. Chapter 11 (page 141) discusses this topic in detail. Nodes must be stopped and restarted during the reimaging process. When a node is reimaged, but its partner in an availability set is still running, the reimaged node comes under the control of the availability tool automatically.
Info: HeartBeat-managed services are: '' Info: NAT HA servers are: 'n16 n14' Info: confirming the list of services to transfer from availability... Info: Serviceguard found running on these nodes: 'n16 n14' Info: Starting transfer of services from Serviceguard... stopAvailTool: ========== Executing '/opt/hptc/availability/serviceguard/stop_av ail'... Stopping HP Serviceguard cluster [-C /usr/local/cmcluster/conf/avail1.config -P /usr/local/cmcluster/conf/nat.n16/nat.n16.
3 Starting Up and Shutting Down the HP XC System This chapter addresses the following topics: • • • • • • • “Understanding the Node States” (page 51) “Starting Up the HP XC System” (page 52) “Shutting Down the HP XC System” (page 54) “Shutting Down One or More Nodes” (page 55) “Determining a Node's Power Status” (page 55) “Locating a Given Node” (page 55) “Disabling and Enabling a Node” (page 56) 3.
Table 3-1 Node States (continued) Node State Description Boot_Fail The node failed to boot. The node is returned to the Boot_Ready state. AVAILABLE The node is ready for use. Nodes transition between node states accordingly. Figure 3-1 illustrates the transition of node states.
Notes: Nodes that are disabled either as a result of a critical service configuration failure or with the setnode --disable command cannot be started with the startsys command until they are enabled with the setnode --enable command. Planned nodes cannot be enabled with the setnode --enable command. You can start up only specified nodes by specifying them in a nodelist parameter.
1. 2. Log in as superuser (root) on the head node. Invoke the startsys command with the --image_and_boot option: # startsys --image_and_boot 3. If your system has been configured for improved availability and the nodes that provide availability have been reimaged, enter the transfer_to_avail command to transfer control to the availability tool: # transfer_to_avail For additional information on options that affect imaging, see startsys(8). 3.2.
NOTE: If all these conditions apply, follow the procedure to restart HP Serviceguard: • Your system is configured for improved availability. • Your system is configured for HP Serviceguard, but it does not start up. • You rebooted the head node. • You want to start up the head node only. 1. Enter the following command where hn_name is the name of the head node: # cmruncl -n hn_name The default installed path name for this command is /usr/local/cmcluster/bin/ cmruncl. 2. Answer y at the prompt. 3.
The Unit Identifier LED on the node illuminates on the node's front panel. 3. Invoke the locatenode command again, this time with the --off option to turn off the Unit Identifier LED when you are done. 3.7 Disabling and Enabling a Node You can disable one or more nodes in the HP XC system. Disabled nodes are ignored when the HP XC system is started or stopped with the startsys and stopsys commands, respectively.
4 Managing and Customizing System Services This chapter describes the HP XC system services and the procedures for their use. This chapter addresses the following topics: • • • • • • • “HP XC System Services” (page 57) “Displaying Services Information” (page 59) “Restarting a Service” (page 61) “Stopping a Service” (page 62) “Adding a New Service” (page 74) “Global System Services” (page 62) “Customizing Services and Roles” (page 62) 4.
Table 4-1 Linux and third-party System Services (continued) Service Function Database Name IP Firewall Sets up IP firewalls on nodes. iptables LSF Master Node Load Sharing Facility for HP XC master node. lsf LVS Director Handles the placement of user login sessions on lvs nodes when a user logs in to the cluster alias. NAT Server Network Address Translation server. nat NAT Client Network Address Translation client.
Table 4-2 HP XC System Services (continued) HP XC Service Function Database Name Slurm Launch Allows user to launch jobs on nodes with slurm_compute service. slurm_launch smartd daemon Monitors the reliability of specific hard drives on CP6000 systems. smartd Supermon Aggregator Gathers information from subordinate nodes running Supermon. supermond Image Server Holds and distributes the system images.
ntp: n3 pdsh: n[1-3] pwrmgmtserver: n3 slurm: n3 supermond: n3 swmlogger: n3 syslogng_forward: n3 For more information, see shownode(8) You can obtain an extensive list of all services running on a given node by invoking the following command: # service --status-all 4.2.2 Displaying the Nodes That Provide a Specified Service You can use the shownode servers command to display the node or nodes that provide a specific service to a given node. You do not need to be superuser to use this command.
The shownode services node client command does not display any output if no client exists. Another keyword, servers, allows you to determine the node that provides a specified node its services.
4.4 Stopping a Service The method to use to stop a service depends whether or not improved availability is in effect for that service.
• • • • • “Advance Planning” (page 70) “Editing the roles_services.ini File” (page 70) “Creating a service.ini File” (page 71) “Adding a New Service” (page 74) “Verifying a New Service” (page 76) 4.6.1 Overview of the HP XC Services Configuration HP XC System Software includes a predefined set of services that are delivered using node role assignments; however, a third-party software installation might require you to add a service that is not part of the default HP XC services model.
4.6.2 Service Configuration Sequence of Operation To understand the relationship between the cluster_config utility and the service configuration scripts, it is important to know the sequence of events that occur during cluster_config processing: 1. 2. 3. 4. 5. 6. 7. Service-specific attributes are made available to the cluster_config utility in service-specific *.ini files. As the superuser (root), you run the cluster_config utility on the head node to configure the HP XC system.
Table 4-4 Location of Configuration Script Directories (continued) Script Directory Invoked by This cluster_config Configuration Argument Invoked by This cluster_config Unconfiguration Argument /opt/hptc/etc/nconfig.d/ nconfigure nunconfigure /opt/hptc/etc/cconfig.d/ cconfigure cunconfigure To see the sconfig, gconfig, nconfig, and cconfig scripts that are delivered as part of the default services configuration mode, llook in the /opt/hptc/etc/*config.d/ directories. 4.6.
The head node is the golden client, and only one golden client is supported. Each script in this directory is executed unconditionally during the sconfigure process. The sconfigure scripts return 0 (zero) on success and return a nonzero value on failures. You can stop the configuration process on a nonrecoverable sconfigure error, which is indicated by the sconfigure script exiting with a return code of 255. Alternatively, you can use config_die( ) in ConfigUtils.pm to return 255.
Writing gconfigure Scripts The information in this section provides information about how to write a gconfigure script. A sample gconfigure script is provided in the /opt/hptc/templates/gconfig.d/ gconfig_template.pl file for your reference. The gconfigure and other configuration scripts often use the Perl Set::Node package, which is derived from the Set::Scalar package (from Perl's CPAN) to facilitate working with sets of nodes with the usual set operators (union, intersection, difference, and so on).
3 The $assignment_flags are used to determine the pattern of the client-to-server assignments. The default is a server may be assigned to itself as a client. The following service attributes control a client's assignment to itself as a server and are valid for client and server assignments: • sa_do_not_assign_to_self • sa_must_assign_to_self • sa_may_assign_to_self This is the default, which is the same as the double quote character (“).
Nodes with na_disable_server.service assigned for a service are excluded from the server list passed into the gconfig script as servers. Nodes with na_disable_client.service assigned for a service are not returned as potential clients of that service. In general, these flags are something the gconfig script does not need to handle explicitly. Nothing precludes each gconfig script from offering optimal choices to you through its user interface.
scripts could choose that each use a disjoint 5 of 10 servers passed in order to spread the load. At present, for services that are part of the same role, there are no mechanisms in place to achieve this. 4.6.7 Advance Planning You use the cluster_config utility to assigns roles, and hence, services, to nodes. To add a new service to the roles model, perform one of the following: • • Add a new service to an existing role or roles.
. common management_server EOT • The second stanza lists all the services: services = <
Example 4-2 Sample service.ini FIle [Config] # Is the service included in the default configuration? [0/1] service_recommended = 0 or 1 # How many nodes can each server handle optimally? # (Use 1000000 if there should be only one server per system.) optimal_scale_out = some_value # Must the service be run on the head node? [0/1] # Is it advantageous to run the service on the head node? [0/1] # (No more than one of these should be 1.
head_node_desired Assigning 1 to this parameter indicates that running the service on the head node is beneficial but not necessary. Assigning 0 indicates that the service does not benefit from running on the head node. Do not assign 1 to both the head_node_required and head_node_desired parameters. If you assign 1 to the head_node_desired parameter, assign 0 to the head_node_required parameter.
3 1 0 1 3 Total nodes Nodes assigned service roles Exclusive servers Non-exclusive servers Compute nodes Role HN HN Ext Ext Exc Recommend Assigned Rec Req Rec Req Rec Rec Role ---------------------------------------------------------------3 3 compute 1 1 disk_io 1 1 external (optional) 1 0 login (optional) 1 1 management_hub 1 1 management_server 1 0 nis_server (optional) 1 1 resource_management The column headings in the middle of the report correspond to parameters in the service.
4. Use the text editor of your choice to edit the roles_services.ini file as follows: a. Add the name of the new service to the stanza that lists the services. services = <
a. Add the name of the new role to the stanza that lists all the roles: roles = <
5 Managing Licenses This chapter describes the following topics: • • • “License Manager and License File” (page 77) “Determining If the License Manager Is Running” (page 77) “Starting and Stopping the License Manager” (page 77) 5.1 License Manager and License File The license manager service runs on the head node and maintains licensing information for software on the HP XC system. You can find additional information on the FLEXlm license manager at the Macrovision web address: http://www.macrovision.
5.3.1 Starting the License Manager Use the following command to start the license manager: # service hptc-lm start 5.3.2 Stopping the License Manager Use the following command to stop the license manager: # service hptc-lm stop 5.3.
6 Managing the Configuration and Management Database The configuration and management database, CMDB, is key to the configuration of the HP XC system. It keeps track of which nodes are enabled or disabled, the services that a node provides, the services that a node receives, and so on.
host_name: hwaddr: ipaddr: level: location: netmask: region: cp-n1 00:e0:8b:01:02:03 172.21.0.1 1 Level 1 Switch 172.20.65.2, Port 40 255.224.0.0 0 cp-n2: cp_type: host_name: hwaddr: ipaddr: level: location: netmask: region: IPMI cp-n2 00:e0:8b:01:02:04 172.21.0.2 1 Level 1 Switch 172.20.65.2, Port 41 255.224.0.0 0 . . .
6.2.3 Displaying Blade Enclosure Information You can use the shownode command to provide information for the HP XC system with HP BladeSystems. With this command, you can: • • • • Display the names of the blade enclosures and the nodes (server blades) within them. List all the blade enclosures in the HP XC system and, for each, the nodes within them. To list the nodes for a specified blade enclosure. To list the blade enclosure for a specified node. For more information, see shownode(1). 6.
in .Log. Archiving has the advantage of decreasing the size of the log tables, which enables the shownode metrics command to run more quickly. You must be superuser to use this command. Use the --archive archive-file option to specify the file to which the sensor data will be archived. The --tmpdir directory option lets you assign a temporary directory for use while archiving. You can retain previous archives with the --keep n option.
command. See “Archiving Sensor Data from the Configuration Database” (page 81) for a description of the time parameter. The following command purges sensor data older than two weeks: # managedb purge 2w For more information on the managedb command, see managedb(8). The archive.cron script enables you to automate the process of archiving sensor data from the CMDB. For more information on this script, see archive.cron(4). 6.
7 Monitoring the System System monitoring can identify situations before they become problems.
The data collected by Supermon includes system performance sensor and environment data, such as fan, temperature, and power supply status. This data is collected on a regular basis. — The syslog and syslog-ng Services The syslog service runs on each node in the HP XC system. These daemons capture log information and send it to an aggregator regional node. Regional nodes are assigned to each client node.
Figure 7-1 System Monitoring The mond and syslog daemons run on every node. The Supermon service manages requests for mond daemons that run on a subset of nodes. The mond daemon can be configured to pass any metric data for aggregation to the parent Supermon service. The Nagios master and other Nagios monitors run their check_metrics plug-in periodically, which causes Supermon data collection and storage into the database.
management interface include the hpasm package. You can use the /sbin/hplog utility to display the following environment data: • • • Thermal sensor data Fan data Power data In addition, most hpasm errors are logged to the syslog system logger. For more information, see hpasm(4) and hplog(8). 7.4 Monitoring Disks The Self-Monitoring, Analysis and Reporting Technology (SMART) system is built into many IDE, SCSI-3, and other hard drives.
times (in milliseconds) for each processor on the node from which this command is issued.
7.6 Logging Node Events This section describes how the HP XC system uses the syslog and syslogng_forward services to log node events and how these events are arranged according to the syslog-ng.conf rules file. 7.6.1 Understanding the Event Logging Structure The HP XC System Software uses aggregator nodes to log events from clients. Aggregator node assignments are made when the HP XC System Software is installed and configured. Each node in the HP XC system is assigned to an aggregator node.
Destinations Contains the devices and files where the messages are sent or saved. Logs Combines the sources, filters, and destination into specific rules to handle the different messages. You can use a text editor, such as emacs or vi, to read the log files, and you can use a variety of text manipulation commands to find, sort, and format these log files. 7.6.3 Modifying the syslog-ng Rules Files The HP XC system supplies a default configuration of the syslog-ng rules.
9. Update the golden image to ensure a permanent change. For more information on updating the golden image, see Chapter 11 (page 141) . 7.7 The collectl Utility The collectl utility collects data on the nodes of the HP XC system. As a development or debug tool, the collectl utility typically gathers more detail more frequently than the supermon utility. The collectl utility does have some overhead, but for most situations, it consumes less than 0.
Example 7-1 Using the collectl Utility from the Command Line # collectl waiting for 10 second sample... ### RECORD 1 >>> n3 <<< (m.n) (date and time stamp) ### # CPU SUMMARY (INTR, CTXSW & PROC /sec) # USER NICE SYS IDLE WAIT INTR CTXSW PROC RUNQ RUN AVG1 AVG5 AVG15 0 0 0 99 0 1055 65 0 151 0 0.02 0.04 0.
By default, the collectl service gathers information on the following subsystems: • • • • • • • • • CPU Disk Inode and file system Lustre file system Memory Networks Sockets TCP Interconnect The collectl(1) manpage discusses running the collectl utility as a service. 7.7.3 Running the collectl Utility in a Batch Job Submission You can run the collectl utility as one job in a batch job submission.
7.8 Using HP Graph To Display Network Bandwidth and System Use The RRDtool software tool is integrated into the HP XC system to create and displays graphs about the network bandwidth and other system utilization. You can access this display by selecting HP Graph in the Nagios menu. Figure 7-4 is an example of the default display. It provides an overview of the system with graphs for node allocation, CPU usage, memory, Ethernet traffic, and, if relevant, Interconnect traffic.
Figure 7-4 HP Graph System Display By selecting an item in the menu in the upper left-hand side, you can specify the graphical data on any Nagios host. Figure 7-5 shows the graphical data for one node on the system.
NOTE: The detail graphs for a system display show the graphs for a specified metric on all the Nagios hosts. The detail graphs for a Nagios host display show all the applicable metrics for that Nagios host. 7.
Figure 7-5 HP Graph Host Display 98 Monitoring the System
The Metric menu influences the display of the detail graphs for a system display. This menu offers the following choices: bytes in This graph reports the rate of data received on all network devices on the node. bytes out This graph reports the rate of data transmitted on all network devices on the node. cpu idle This graph indicates how much of the node's CPU set was available for other tasks.
7.9 The resmon Utility The resmon utility is a job-centric resource monitoring Web page initially inspired by the open-source clumon product. It consists of a single Perl CGI script that, when viewed, invokes useful commands to collect and present data in a scalable and intuitive fashion. The Web pages update automatically at a preconfigured interval (120 seconds by default).
Figure 7-6 Resmon Web Page 7.
For more information on the resmon utility, see resmon(1). 7.10 The kdump Mechanism and the crash Utility 7.10.1 The kdump Mechanism This release introduces kdump, a reliable mechanism to save crash dumps. The two major components to the kdump mechanism are: • • a minimal kernel initrd, a RAM disk file system that contains the drivers and the initialization code that enable the kernel to operate during the crash dump. These components reside in a reserved area of memory at boot time.
rpm -ihv RPM where RPM indicates the RPMs that need to be installed. 3. 4. Distribute the software to the appropriate nodes in the HP XC system. See Chapter 11 (page 141) for more information. If you have configured the kdump service, reboot all nodes except the head node to activate kdump. During configuration with the nconfig and cconfig scripts, kdump updates the /boot/ grub/grub.conf file to specify the crashkernel parameter. The node must be rebooted for the change to take effect. The /etc/init.
vmlinux The name of the statically-linked uncompressed kernel image. Its location is /usr/lib/debug/lib/modules/*rev-number/vmlinux. For more information, see crash(8).
8 Monitoring the System with Nagios The Nagios open source application has been customized and configured to monitor the HP XC system and network health. This chapter introduces Nagios and discusses these modifications.
“Messages Reported by Nagios” (page 251) describes troubleshooting information reported by Nagios. This section addresses the following topics: • • • • • “Nagios Components” (page 106) “Nagios Hosts” (page 106) “Nagios Plug-Ins” (page 106) “Nagios Web Interface” (page 107) “Nagios Files” (page 107) 8.1.
• • • • Syslog alert monitor Syslog alerts status System event log System free space status report For more information on the services monitored by Nagios and the type of function monitored for that service, see Table 8-2. 8.1.4 Nagios Web Interface Nagios provides a Web interface capable of displaying current system and networking information in a browser window. See “Using the Nagios Web Interface” (page 107) for more information. 8.1.
Figure 8-1 Nagios Main Window You can choose any of the options on the left navigation bar. These options are shown in Figure 8-2.
Figure 8-2 Nagios Menu (Truncated) After you chose an option from the window, you are initially prompted for a login and a password. This login and the password were established when the HP XC system is configured. Usually, the login name is nagiosadmin. The Nagios passwords are maintained in the /opt/hptc/nagios/etc/htpasswd.users file. Use the htpasswd command to manipulate this file to add a user, to delete a user, or to change the user password. 8.
Nagios offers various views of the HP XC system.
The top of the window provides information on the network. It provides the number of network outages and information on the network health in terms of the Nagios hosts and Nagios services. The next portion of the window has information on the Nagios hosts. It reports the number of hosts down, unreachable, up, and pending. In the example, one host is down.
Figure 8-5 Nagios Service Information View You can also use the Nagios report generator utility, nrg, to obtain an analysis of the Nagios service (plug-in). Select the analyze option to display a two-column listing of service status. The following is the command line entry for this feature: # nrg --mode analyze nh [Slurm Monitor - Critical] 'sinfo' reported problems with nodes in some partitions, specifically, some nodes may be marked with an '*' which indicates they may be unresponsive to SLURM.
Figure 8-6 Nagios Service Problems View Selecting the link corresponding to the Nagios Host opens the Nagios Host Information view for that Nagios host. Figure 8-7 is an example of the Nagios Host Information view displayed by selecting the link for n15 in the Nagios Service Problems view shown in Figure 8-6. 8.
Figure 8-7 Nagios Host Information View You can also use the Nagios report generator utility, nrg, to obtain an analysis of the Nagios service (plug-in). Select the analyze option to display a two-column listing of service status. The following is the command line entry for this feature: # nrg --mode analyze nh [Slurm Monitor - Critical] 'sinfo' reported problems with nodes in some partitions, specifically, some nodes may be marked with an '*' which indicates they may be unresponsive to SLURM.
• • • • • “Forwarding Nagios e-mail Alerts” (page 116) “Changing Sensor Thresholds” (page 117) “Adjusting the Time Allotted for Metrics Collection” (page 118) “Changing the Default Nagios User Name” (page 119) “Disabling Individual Nagios Plug-Ins” (page 120) 8.3.1 Stopping and Restarting Nagios Nagios can record a multitude of alerts on large systems when many nodes undergo known maintenance operations. These operations can include restarting or shutting down the HP XC system.
NOTE: If you change the nagios_vars.ini file, you must propagate it to all nodes. For more information, see Chapter 11 (page 141) Figure 8-8 Nagios Configuration When you change the Nagios configuration, you must perform the following tasks: 1. 2. 3. 4. 5. 6. Read the Nagios documentation carefully. Change the template files accordingly. Stop the Nagios service. For instructions on how to stop the Nagios service, see “Stopping and Restarting Nagios” (page 115).
NOTE: Ensure that the sendmail utility is running. For information on the implementation of the sendmail utility on the HP XC system, see “Modifying Sendmail” (page 133). You can customize the Nagios configuration to specify whom to contact by editing the /opt/ hptc/nagios/etc/contacts.cfg file.
2. Rebuild vars.ini with the following command: # /opt/hptc/nagios/libexec/check_nagios_vars --rebuild 3. Run the following command to push the updated vars.ini file: # /opt/hptc/nagios/libexec/check_nagios_vars --update If you change the nagios_vars.ini file, be sure to propagate the file to the appropriate nodes, usually the management hubs, on your system; see Chapter 11 (page 141) for more information.
8.3.6 Changing the Default Nagios User Name Often the Nagios user name and user ID are established during the initial system configuration, that is, when the cluster_config utility is run. If a Nagios user name is found at that time, the HP XC system uses that user name and user ID instead of creating the default user name and user ID. However, you can configure the HP XC system to use an alternate nagios user and group account. Use the following procedure to change the default Nagios user name. 1. 2.
7. Create the ssh keys for the newname user account: # /opt/hptc/bin/ssh_create_shared_keys –user newname 8. Capture the Nagios keys and replicate them across the HP XC system: # tar cvf /hptc_cluster/newname_keys.tar /home/newname # pdsh –a –x nh "tar xvf /hptc_cluster/newname_keys.tar" 9. Verify that you can log in to a random node as the newname user: # ssh any_node -l newname 10. Use the nconfigure utility to reconfigure Nagios across the HP XC system: # pdsh –a "service nagios nconfigure" 11.
This section contains information about HP XC Nagios configuration and procedures for configuring Nagios for your system. It addresses the following topics: • • • • “Monitored Nagios Services” (page 121) “Nagios Default Settings” (page 123) “Understanding Nagios Alert Messages” (page 125) “System Event Log Monitoring” (page 126) 8.4.1 Monitored Nagios Services Table 8-2 lists each Nagios service, also known as a plug-in, that Nagios monitors, by category and the function.
Table 8-2 Monitored Nagios Services Category Nagios Service Function Monitoring Plug-Ins Configuration Monitor This plug-in updates node configuration. It periodically generates and updates configuration display information for all nodes in the HP XC system (see Configuration). Enclosure Monitor Enclosures for HP Blade systems are represented as hosts in nagios web interface. The check_enclosures plug-in alerts you if sensor data for the enclosures is outside its operational range.
Table 8-2 Monitored Nagios Services (continued) Category Nagios Service Function System Service Reports Apache HTTPS Server This plug-in monitors the Web server providing the Nagios Web interface. Root key synchronization This plug-in verifies that the ssh configuration files are synchronized across the HP XC system. Switch status This plug-in gathers switch status and metrics through SNMP.
Service Description Specifies the Nagios service name. Actively launched on service node? Indicates whether or not Nagios periodically runs this service check at the specified normal check interval. Max Check Attempts Indicates the number of times Nagios examines the service before reporting a failure. Normal check Indicates the frequency of the check interval. Retry check interval Indicates the amount of time Nagios waits before retrying after a failure.
NOTE: These default settings may have been altered by site customizations. To display the current values for your installation, use the Nagios Web interface: select View Config from the Configuration section under the Nagios menu. 8.4.3 Understanding Nagios Alert Messages The HP XC System Software provides several value-added plug-ins that can generate alert messages based on patterns provided by various data sources, such as syslog and the Hardware System Event logs.
[n47] Power Unit Power Redundancy Redundancy Lost 7 8 9 A date and time stamp indicating when the cause for the alert happened. How long the message waited in the nand queue, that is, how much time elapsed before this message was mailed. The nand sequence number. The nand daemon receives and batches messages generated by Nagios and sends them by e-mail. 8.4.4 System Event Log Monitoring This section explains the system event log and describe configuration details. 8.4.4.
8.5 Using the Nan Notification Aggregator and Delimiter To Control Nagios Messages The HP XC System Software incorporates the Nan notification aggregator and delimiter for the Nagios paging system. Nan is an open source supplement to the Nagios application. Nagios is capable of sending quantities of messages especially when the system is starting up, shutting down, or experiencing a failure.
Example 8-1 The nrg Utility System State Analysis # nrg --mode analyze Nodelist ---------------------n[3-7] nh 128 Description --------------------------------------------------[Environment - NODATA] No sensor data is available for reporting. Use 'shownode metrics sensors -last 20m node xxxx' for each of these nodes to verify if sensor data has been recently collected. This status is drawn from the same source as the shownode metrics sensors command.
Run 'sinfo' for more information. n[3-7] [Slurm Status - Critical] sinfo reported problems with partitions for this node nh [Supermon Metrics Monitor - Critical] The metrics monitor has returned a critical status indicating a number of nodes have reported critical thresholds. If the actual status is 'Service timed out' then the monitor has taken too long to complete a single iteration.
— — — • • • • Ok Unkown Pending Nagios hosts status. Nagios services status. Nagios monitors status. A list of nodes that are up or down. For additional information about this utility, see nrg(8).
9 Network Administration This chapter addresses the following network topics: • • • • • “Network Address Translation Administration” (page 131) “Network Time Protocol Service” (page 132) “Changing the External IP Address of a Head Node” (page 133) “Modifying Sendmail” (page 133) “Bonding Ethernet Network Interface Cards for Failover” (page 133) 9.
Improved Availability Is Not in Effect You establish the external role assignment when you configure the HP XC system using the cluster_config utility. When nodes are configured as NAT clients, the default gateways are established. By default, each NAT client has a single gateway. If a NAT server fails, however, the NAT client loses connectivity. You can configure a system for multiple gateways to lessen the possibility of loss of connectivity, but the system may have performance problems.
Other tools (ntpq and ntpdc) are also available. For more information on NTP, see ntpd(1), ntpdc(1), and ntpq(1). 9.3 Changing the External IP Address of a Head Node Use the following procedure to change the external IP address of the head node: NOTE: This procedure requires you to reboot the head node. 1. Edit the /etc/sysconfig/netinfo file as follows: a. Specify the new head node external IP address in the --ip option of the network command. b.
You can implement Ethernet NIC bonding on any node, but not on the head node. Ethernet NIC bonding is implemented through node imaging; this allows its subsequent persistent automatic configuration when the node is reimaged. For more information on node imaging, see Chapter 11 (page 141). The /opt/hptc/config/nicbond/nicbond.staticdb file enables you to specify the key data to implement Ethernet NIC bonding.
a. Identify the node on which Ethernet NIC bonding will be configured. NOTE: b. The node must have at least two free Ethernet interfaces. Determine the names of the Ethernet devices and their MAC addresses for the Ethernet NICs you want to bond. Use the following command to find the Ethernet device information for all node types except the HP Integrity rx8620: # /usr/sbin/lshw -class network ...
# ifconfig -a | grep eth eth0 Link encap:Ethernet eth1 Link encap:Ethernet eth2 Link encap:Ethernet eth3 Link encap:Ethernet HWaddr HWaddr HWaddr HWaddr 00:00:00:00:00:00 00:00:00:00:00:10 00:00:00:00:00:20 00:00:00:00:00:30 The Ethernet device names are listed in the first column; the MAC addresses follow the designation HWaddr. c. d. 3. 4. Determine which Ethernet device should be the primary device.
10 Managing Patches and RPM Updates This chapter addresses the following topics: • • • • • “Sources for Software Packages and Information” (page 137) “Digital Signature” (page 137) “Downloading and Installing Patches” (page 138) “Rebuild Kernel Dependent Modules” (page 138) “Rebuilding Serviceguard Modules” (page 139) 10.
10.3 Downloading and Installing Patches Follow this procedure to download and install HP XC patches from the ITRC website: 1. Create a temporary patch download directory on the head node.
HP recommends that you rebuild the modules immediately after installing the new kernel and reboot the head node so that the updated modules are included in the golden image that is created by the cluster_config utility. 10.5 Rebuilding Serviceguard Modules The HP Serviceguard software contains two modules that are built against the kernel. You must use the following procedure to rebuild the pidentd and deadman modules if a new kernel is delivered in a patch.
11 Distributing Software Throughout the System This chapter addresses the following topics: • “Overview of the Image Replication and Distribution Environment” (page 141) • “Installing and Distributing Software Patches” (page 142) • “Adding Software or Modifying Files on the Golden Client” (page 143) • “Determining Which Nodes Will Be Imaged” (page 147) • “Updating the Golden Image” (page 147) • “Propagating the Golden Image to All Nodes” (page 151) • “Maintaining a Global Service Configuration” (page 151)
3. Distribute the golden image or individual files to the client nodes. See “Propagating the Golden Image to All Nodes” (page 151). 11.2 Installing and Distributing Software Patches The following is a generic procedure for installing software patches: 1. 2. Log in as superuser (root) on the head node. Use the rpm command to install the software package on the head node: # rpm -Uvh package.
11.3 Adding Software or Modifying Files on the Golden Client The first step in managing software changes to your HP XC system is to update the golden client node. This can involve adding new software packages, adding a new user, or modifying a configuration file that is replicated across the HP XC system, such as a NIS or NTP configuration file. Note: It is important to have a consistent procedure for managing software updates and changes to your HP XC system.
2. 3. Use the text editor of your choice to modify the OVERRIDES variable to match your overrides subdirectory name, /name. In the /var/lib/systemimager/scripts directory, create symbolic links to this master autoinstallation script for the nodes that will receive this override. The symbolic link names must follow the format name.sh, where name is the host name of each node to receive the override.
9. Link all the login nodes (that is, those nodes with the lvs service) to the compiler.master.0 autoinstallation script # for i in $(expandnodes $(shownode servers lvs)) do ln -sf compiler.master.0 $i.sh done 10. Verify that the links are correct # ls -l . . . lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx 1 root 1 root 1 root root root root ... n7.sh -> compiler.master.0 ... n8.sh -> compiler.master.0 ... n9.sh -> compiler.master.0 Now the system can be imaged.
[base_image] DIR = /var/lib/systemimager/images/base_image [override_n8_override] DIR = /var/lib/systemimager/overrides/n8_override [override_base_image] DIR = /var/lib/systemimager/overrides/base_image [override_compiler] DIR = /var/lib/systemimager/overrides/compiler Save the file and exit the text editor. 8. Reimage the nodes: # setnode --resync --all # stopsys # startsys --image_and_boot 9. Verify that the propagation occurred as expected by examining the files on the node. 11.3.
known locations. The golden client file system and the cluster common storage are available to support such applications. Global service configuration scripts are located in the /opt/hptc/etc/gconfig.d directory. • Node-Specific Configuration The node-specific service configuration step uses the results of the global service configuration step described previously to apply to a specific node its “personality” with respect to the service.
Notes: Do not update the golden image file system directly. The cluster_config and updateimage utilities ensure that the golden image file structure remains synchronized. Before updating the golden image, make a copy in case you need to revert back. Use the SystemImager si_cpimage command to perform this task. Ensure that you have enough disk space in the target directory where the image will be saved; image sizes are typically 3 to 6 GB.
NOTE: When the HP XC system is up and running, and you use the cluster_config utility to change a role, you are prompted to reconfigure SLURM. If you answer no, the existing SLURM configuration is preserved and is not modified in any way. However, if you made changes with the cluster_config utility that affect SLURM, such as adding a compute node or moving a resource management role, HP recommends that you answer yes to update the SLURM configuration.
When the updateimage command completes, the golden image is synchronized with the golden client. You are ready to deploy the golden image to all the nodes in your HP XC system. NOTE: By default, the updateimage command sets all client nodes to network boot from their Ethernet adapter, connected to the administration network, the next time the nodes are rebooted. This causes the nodes to reinstall themselves automatically, thus receiving the latest golden image.
11.5.4 Ensuring That the Golden Image Is Current Use the updateimage command or thecluster_config command to ensure that the golden image contains all the latest software for all the nodes. If you copy software directly to client nodes without updating the golden image first, that software will be deleted from the client nodes the next time the clients are re-imaged.
The global and node-specific configuration framework (gconfig and nconfig) configures additional node-specific services, as the node is initially installed, based on how the HP XC system was defined during configuration.
12 Opening an IP Port in the Firewall This chapter addresses the following topics: • • “Open Ports” (page 153) “Opening Ports in the Firewall” (page 154) 12.1 Open Ports Each node in an HP XC system is set up with an IP firewall, for security purposes, to block communications on unused network ports. External system access is restricted to a small set of externally exposed ports. Table 12-1 lists the base ports that are always open by default; these ports are labeled “External”.
Table 12-2 Service Ports Service Internal or External Port Number Protocol Comments Flamethrower Internal 9000 to 9020 udp The highest port number used is based on the number of modules configured to udpcast. Usually, the upper limit is 9020. LSF External 6878, 6881 to 6883, 7869 tcp/udp Only if the HP XC system is set up as a member of a larger LSF cluster. See “Installing LSF with SLURM into an Existing Standard LSF Cluster ” (page 291) for more information.
12.2.1 Opening a Temporary Port in the Firewall The openipport command enables the superuser to open an IP service port in the firewall using the following information: • • • The port number to open The protocol to be used The list of interfaces on which the port is to be opened NOTE: Use the openipport command judiciously. The port remains open unless or until the node is reimaged, even if the node is rebooted.
Notes: For clarity, the mnemonics for the interface are shown in bold and the noncomment lines span two lines. Noncomment lines each must take only one line in the iptables.proto file.
13 Connecting to a Remote Console This chapter addresses the following topics: • • “Console Management Facility” (page 157) “Accessing a Remote Console” (page 157) 13.1 Console Management Facility The Console Management Facility (CMF) daemon collects and stores console output for all other nodes on the system. This information is stored individually for every node and is backed up periodically. This information is stored under dated directories in the /hptc_cluster/adm/ logs/cmf.dated/current directory.
6. Enter the escape character returned by the console command in Step 3 to end the connection. Note: Some nodes, depending on the machine type, accept a key sequence to enter and exit their command-line mode. See Table 13-1 to determine if these key sequences apply to your node machine type. Do not enter the key sequence to enter command-line mode. Doing so stops the Console Management Facility (CMF) from logging the console data for the node.
14 Managing Local User Accounts and Passwords This chapter describes how to add, modify, and delete a local user account on the HP XC system.
2. Collect as much of the following information about this account as possible: • Login name This information is required. • • User's name User's password Note: A customary practice is to assign a temporary password that the user changes with the passwd command, but this data must be propagated to all the other system nodes also. See “Distributing Software Throughout the System” (page 141) for more information.
Note: Make sure that users who change their user account parameters do so on the golden client node, or that they notify you from which node they changed their parameters. You must propagate these user account changes to all the other nodes in the system as described in “Distributing Software Throughout the System” (page 141). 14.5 Deleting a Local User Account Remove a user account with the userdel command; you must be superuser on the golden client node to use this command.
3. Use the text editor of your choice as follows: a. Open the temporary file. b. Append the following lines to the temporary file: # Download NIS maps according to update frequency. 20 * * * * /usr/lib/yp/ypxfr_1perhour 40 6 * * * /usr/lib/yp/ypxfr_1perday 55 6,18 * * * /usr/lib/yp/ypxfr_2perday c. 4. Save your changes and exit the text editor. Replace the existing root user's crontab file with the temporary file: # crontab /tmp/root_crontab 5. Remove the temporary file: # rm -f /tmp/root_crontab 6.
4. Run the following commands to update all the appropriate files throughout the HP XC system: # # # # pdcp pdcp pdcp pdcp -a -a -a -a -x -x -x -x `hostname` `hostname` `hostname` `hostname` /etc/passwd /etc/passwd /etc/group /etc/group /etc/shadow /etc/shadow /etc/gshadow /etc/gshadow When this step is complete, the root password is changed on all the nodes in the HP XC system. 14.8.
Generally, you can change the password by logging in to the switch and using its Quadrics Switch Control main menu as in the following example: Enter 1-9 and press return: 8 1. Passwd settings 2. Access protocol settings Enter 1,2 and press return: 1 changing password for quadrics Current Password: New password: Retype new password: 14.8.3.
3. Make a backup copy of the cliPassWord.crpt file: # cp /mnt/jffs/voltaire/config/cliPassWord.crpt \ /mnt/jffs/voltaire/config/cliPassWord.crpt.orig 4. Copy the /etc/shadow file into the cliPassWord.crpt file: # cp /etc/shadow /mnt/jffs/voltaire/config/cliPassWord.crpt 14.8.3.5 Cisco SFS 7000 Series InfiniBand Server Switches The instructions for changing the administrative passwords for Cisco® SFS 7000 Series InfiniBand Server Switches are located in the documentation for the specific switch model.
On HP XC systems whose head node is either an HP CP6000 system or an HP CP4000 system, you can only obtain the sensor and System Event Log information remotely. 14.8.5.1 Setting the BMI/IPMI Password Using an External Connection to the Head Node Management Port Allow access to the head node's management port through the external IP address, and synchronize the management port passwords as shown by the following procedure: NOTE: 1. Substitute the node name (for example, n144) for the headnode variable.
1. 2. Log in as superuser (root) on the head node. Use the passwd command to change the password locally on the head node: # passwd lsfadmin At this time, the lsfadmin account password is changed only on the head node. 3.
15 Managing SLURM The HP XC system uses the Simple Linux Utility for Resource Management (SLURM).
that the slurmctld daemon failed, the backup daemon assumes the responsibilities of the primary slurmctld daemon. On returning to service, the primary slurmctld daemon regains control of the SLURM subsystem from the backup slurmctld daemon. SLURM offers a set of utilities that provide information about SLURM configuration, state, and jobs, most notably scontrol, squeue, and sinfo. See scontrol(1), squeue(1), and sinfo(1) for more information about these utilities.
Table 15-1 SLURM Configuration Settings (continued) Setting Default Value* SlurmUser 'slurm' NodeName All compute nodes, plus 'Procs=2' PartitionName 'lsf RootOnly=YES Shared=FORCE Nodes=compute_nodes SwitchType switch/elan for systems with the Quadrics interconnect switch/none for systems with any other interconnect * Default values can be adjusted during installation. You can also use the scontrol show config command to examine the current SLURM configuration.
TIP: Run the badmin reconfig command after the spconfig command to update LSF HPC with the information on each node's static resources (that is, core and memory), as reported by SLURM. 15.2.1 Configuring SLURM System Interconnect Support SLURM has system interconnect support for Quadrics ELAN, which assists MPI jobs with the global exchange process during startup, when each process is establishing the communication channels with the other processes in the job.
Weight The scheduling priority of the node. Nodes of lower priority are scheduled before nodes of higher priority, all else being equal. To change the configuration of a set of nodes, first locate the line in the slurm.conf file that starts with the following text to specify the configuration: NodeName= Multiple node sets are allowed on the HP XC system; the initial configuration specifies a single node set. Consider a system that has 512 nodes, and all those nodes are in the same partition.
Note: The root-only lsf partition is provided for submitting and managing jobs through an interaction of SLURM and LSF. If you intend to use SLURM independently from LSF, consider configuring a separate SLURM partition for that purpose. Table 15-2 describes the SLURM partition characteristics available on HP XC systems. Table 15-2 SLURM Partition Characteristics Characteristic Description Nodes List of nodes that constitute this partition.
PartitionName=lsf RootOnly=yes Shared=Force Nodes=n[1-128] PartitionName=cs Default=YES Shared=YES Nodes=n[129-256] If you make any changes, be sure to run the scontrol reconfigure command to update SLURM with these new settings. 15.2.5 Configuring SLURM Features A standard element of SLURM is the ability to configure and subsequently use a feature. You can use features to assign characteristics to nodes to manage multiple node types. SLURM features are specified in the slurm.conf file.
Example 15-1 Using a SLURM Feature to Manage Multiple Node Types a. Use the text editor of your choice to edit the slurm.conf file to change the node configuration to the following: NodeName=exn[1-64] Procs=2 Feature=single,compute NodeName=exn[65-96] Procs=4 Feature=dual,compute NodeName=exn[97-98] Procs=4 Feature=service Save the file. b. Update SLURM with the new configuration: # scontrol reconfig c. Verify the configuration with the sinfo command. The output has been edited to fit on the page.
POSIX message queues stack size cpu time max user processes virtual memory file locks (bytes, -q) 819200 (kbytes, -s) 10240 (seconds, -t) unlimited (-u) 8113 (kbytes, -v) unlimited (-x) unlimited Only soft resource limits can be manipulated. Soft and hard resource limits differ.
If you make any changes, be sure to run the scontrol reconfigure command to update SLURM with these new settings. If a user tries to propagate a resource limit with the srun --propagate command, but the compute node has a lower hard limit than the soft limit, an error message results: $ srun --propagate=RLIMIT_CORE . . . Can't propagate RLIMIT_CORE of 100000 from submit host. For more information, see slurm.conf(5). 15.
SLURM job accounting attempts to gather all the statistics available on the systems on which it is run.
Note: The bacct command reports a slightly increased value for a job's runtime when compared to the value reported by the sacct command. LSF with SLURM sums the resource usage values reported by itself and SLURM. 15.4.2 Disabling Job Accounting Job accounting is turned on by default. Note that job accounting is required if you are using LSF. Follow this procedure to turn off job accounting: 1. 2. Log in as the superuser on the SLURM server (see “Configuring SLURM Servers” (page 172)).
Note: You must specify an absolute pathname for the log file; it must begin with the / character. You can choose to isolate this data log on one node or in the /htpc_cluster directory so that all nodes can access it. However, this log file must be accessible to the following: • • • Nodes that run the slurmctld daemon LSF Any node from which you execute the sacct command Note: Ensure that the log file is located on a file system with adequate storage to avoid file system full conditions.
. # # o Define the job accounting mechanism # JobAcctType=jobacct/log # # o Define the location where job accounting logs are to # be written. For # - jobacct/none - this parameter is ignored # - jobacct/log - the fully-qualified file name # for the data file # JobAcctLogfile=/hptc_cluster/slurm/job/jobacct.log JobAcctFrequency=10 . . . g. 5. Save the file. Restart the slurmctld and slurmd daemons: # cexec -a "service slurm restart" 15.
# scontrol update NodeName=nodelist State=drain Reason="describe reason here" See “The nodelist Parameter” (page 33) for a discussion on the use of the nodelist parameter. The reason that you provide for the node draining is displayed by the sinfo command. Be brief but descriptive.
Table 15-4 Output of the sinfo command for Various Transitions (continued) Transition Cause: sinfo shows: Meaning: The System Administrator sets the node state to down. idle The node is ready to accept a job. down The slurmctld daemon has removed the node from service. down* The slurmctld daemon lost contact with the node (see sinfo -R). idle The node has been returned to service. alloc The node is running a job. drng SLURM is waiting for the job or jobs to finish.
15.8 Maintaining the SLURM Daemon Log By default SLURM daemon logs are stored in /var/slurm/log/ on each node that runs SLURM daemons. The slurmctld controller daemon writes to the slurmctld.log file, and the slurmd daemon writes to the slurmd.log file. These log files and their location are configured in the slurm.conf file. You can view this information with the scontrol command, as follows: # scontrol show config | grep LogFile SlurmctldLogFile = /var/slurm/log/slurmctld.
[root@n9]# grep processor /proc/cpuinfo | wc -l 2 c. Determine the amount of real memory in megabytes: [root@n9]# grep MemTotal /proc/meminfo MemTotal: 2056364 kB [root@n9]# expr 2056364 \/ 1024 2008 Note that the value for RealMemory for node n9 is 2008. d. Exit the session: [root@n9]# exit Connection to n9 closed. # 3. 4. 5. If the system has more than one partition, determine the partition to which the new node will be added. Save a backup copy of the /hptc_cluster/slurm/etc/slurm.conf file.
7. Execute the following command to reconfigure SLURM: # scontrol reconfigure 8. Verify the configuration with the following command: # sinfo PARTITION AVAIL lsf up TIMELIMIT NODES infinite 9 STATE NODELIST idle n[1-9] NOTE: If SLURM is not started on the new node, the sinfo command output shows it as down. Restart SLURM in that instance. 15.10 Removing SLURM The HP XC system installation process offers a choice of two different types of LSF.
16 Managing LSF The Load Sharing Facility (LSF) from Platform Computing Inc. is a batch system resource manager used on the HP XC system. LSF is an integral part of the HP XC environment. On an HP XC system, a job is submitted to LSF, which places the job in a queue and allows it to run when the necessary resources become available. In addition to launching jobs, LSF provides extensive job management and information capabilities.
Standard LSF is installed and configured on all nodes of the HP XC system by default. The LSF RPM places the LSF tar files from Platform Computing Inc. in the /opt/hptc/lsf/ files/lsf/ directory. Standard LSF is installed, during the operation of the cluster_config utility, in the /opt/hptc/lsf/top directory.
and monitoring layer for LSF with SLURM. LSF with SLURM uses SLURM interfaces to perform the following: • • • • • • To query system topology information for scheduling purposes. To create allocations for user jobs. To dispatch and launch user jobs. To monitor user job status. To signal user jobs and cancel allocations. To gather user job accounting information.
directly to the job. The SLURM srun command supports an --input option (also available in its short form as the -i option) that provides input to all tasks. 16.2.1.1 Job Starter Scripts LSF with SLURM dispatches all jobs locally. The default installation of LSF with SLURM on the HP XC system provides a job starter script that is configured for use by all LSF queues. This job starter script adjusts the LSB_HOSTS and LSB_MCPU_HOSTS environment variables to the correct resource values in the allocation.
batch job submissions or Standard LSF behavior in general, and creates the potential for a bottleneck in performance as both the LSF with SLURM daemons and local user tasks compete for processor cycles. The JOB_STARTER script has one drawback: all interactive I/O runs through the srun command in the JOB_STARTER script. This means full tty support is unavailable for interactive sessions, resulting in no prompting when a shell is launched.
Table 16-1 LSF with SLURM Interpretation of SLURM Node States (continued) Node Description In Use A node in any of the following states: Unavailable ALLOCATED The node is allocated to a job. COMPLETING The node is allocated to a job that is in the process of completing. The node state is removed when all the job processes have ended and the SLURM epilog program (if any) has ended. DRAINING The node is currently running a job but will not be allocated to additional jobs.
2. 3. 4. 5. 6. 7. Rerun the cluster_config utility. Proceed through the process until you reach the LSF section. When you are prompted to configure LSF, enter yes. When prompted, select the type of LSF you want to install: • Standard LSF is choice 1. • LSF with SLURM is choice 2, the default. When prompted, enter d to delete the existing LSF installation. Answer the remainder of the questions as appropriate for your system. The cluster_config updates the golden image.
7.0 • • This directory remains in place and is imaged to each node of the HP XC system. The SLURM resource is added to the configured LSF execution host HP OEM licensing is configured. HP OEM licensing is enabled in LSF with SLURM by adding the following string to the configuration file, /opt/hptc/lsf/top/conf/lsf.conf. This tells LSF with SLURM where to look for the shared object to interface with HP OEM licensing. XC_LIBLIC=/opt/hptc/lib/libsyslic.
This command searches through a list of nodes with the lsf service until it finds a node to run LSF with SLURM. Alternatively, you can invoke the following command to start LSF with SLURM on the current node: # controllsf start here 16.5.2 Shutting Down LSF with SLURM At system shutdown, the /etc/init.d/lsf script ensures an orderly shutdown of LSF with SLURM.
Example 16-3 Basic Job Launch Without the JOB_STARTER Script Configured $ bsub -I hostname Job <20> is submitted to default queue . <> <> n120 Example 16-4 is a similar example, but 20 processors are reserved. Example 16-4 Launching Another Job Without the JOB_STARTER Script Configured $ bsub -I -n20 hostname Job <21> is submitted to default queue . <> <
You can use the -l (long) option to obtain detailed information about a job, as shown in this example: $ bjobs -l 116 Job <116>, User , Project default, Status , Queue , Co mmand date time: Submitted from host , CWD <$HOME>, Ou tput File <./>, 8 Processors Requested; date time: Started on 8 Hosts/Processors <8*lsfhost.
16.9 Maintaining Shell Prompts in LSF Interactive Shells Launching an interactive shell under LSF with SLURM removes shell prompts. LSF with SLURM makes use of a JOB_STARTER script for all queues; this script is configured by default. It uses the SLURM srun command to ensure that user jobs run on the first node in the allocation instead of the node from which the job was invoked. Follow this procedure to edit the JOB_STARTER script to prevent the removal of shell prompts: 1. 2. Log in as superuser (root).
n4 n4 n5 n5 n5 n5 [lsfadmin@n4 ~]$ exit exit [lsfadmin@n16 ~]$ hostname n16 [lsfadmin@n16 ~]$ 16.10 Job Accounting Standard LSF job accounting using the bacct command is available. The output of a job contains total CPU time and memory usage: $ cat 231.out . . . Resource usage summary: CPU time : Max Memory : Max Swap : 8252.65 sec. 4 MB 113 MB . . .
# shownode servers lsf n[15-16] # pdsh -w n[15-16] "mkdir -p /var/lsf/log.old" # pdsh -w n[15-16] "mv /var/lsf/log/* /var/lsf/log.old/" LSF continues to write to the original log files. Now you can either archive the log.old directory or delete them. You can automate the procedure for caching LSF log files by using a cron job on the head node set for an interval appropriate for your site. 16.
In these examples, 22 processors on this HP XC system are available for use by LSF with SLURM. You can verify this information, which is obtained by LSF with SLURM, with the SLURM sinfo command: date and time $ sinfo --Node --long NODELIST NODES PARTITION n[1-10,16] 11 lsf STATE CPUS idle 2 S:C:T MEMORY TMP_DISK WEIGHT FEATURES REASON 2:1:1 2048 1 1 (null) none The output of the sinfo command shows that 11 nodes are available, and that each node has 2 processors.
16.14.1 Overview of LSF with SLURM Monitoring and Failover Support LSF with SLURM failover is disabled by default. You can enable or disable LSF with SLURM failover at any time with the controllsf command. For more information, see controllsf(8). Note: At least two nodes must have the resource management roles to enable LSF with SLURM failover. One is selected as the master (primary LSF execution host), and the others are considered backup nodes.
If two nodes are assigned the resource management role, by default, the first node becomes the primary resource management node, and the second node is the backup resource management node. If more than two nodes are assigned the resource management role, the first becomes the primary resource management host and the second becomes the backup SLURM host and the first LSF with SLURM failover candidate.
16.14.5 Manual LSF with SLURM Failover Use the following procedure if you need to initiate a manual LSF with SLURM, that is, move LSF with SLURM from one node to another. You need to perform this operation to perform maintenance on the LSF execution host, for example. 1. 2. Log in as the superuser (root). Use the following command to stop LSF with SLURM: # controllsf stop 3.
To move LSF and SLURM back to the original primary node, follow the same procedure with the assumption that the original primary node is now the backup node, and the original backup node is now the primary node. 16.16 Enhancing LSF with SLURM You can set environment variables to influence the operation of LSF with SLURM in the HP XC system. These environment variables affect the operation directly and set thresholds for LSF with SLURM and SLURM interplay. 16.16.
Table 16-3 Environment Variables for LSF with SLURM Enhancement (lsf.conf File) (continued) Environment Variable Description LSB_SLURM_NONLSF_USE=Y LSF with SLURM is configured to end unrecognized jobs in the SLURM lsf partition periodically. This entry stops LSF with SLURM from periodically terminating unrecognized jobs in the SLURM lsf partition. LSF_ENABLE_EXTSCHEDULER=Y|y This setting enables external scheduling for LSF with SLURM The default value is Y, which is automatically set by lsfinstall.
Table 16-3 Environment Variables for LSF with SLURM Enhancement (lsf.conf File) (continued) Environment Variable Description LSF_HPC_EXTENSIONS="ext_name,..." This setting enables Platform LSF extensions. This setting is undefined by default. The following extension names are supported: • SHORT_EVENTFILE This compresses long host name lists when event records are written to the lsb.events and lsb.acct files for large parallel jobs.
Table 16-3 Environment Variables for LSF with SLURM Enhancement (lsf.conf File) (continued) Environment Variable Description LSF_HPC_NCPU_INCREMENT=increment This entry in the lsf.conf file defines the upper limit for the number of processors that are changed since the last checking cycle. The default value is 0. LSF_HPC_NCPU_INCR_CYCLES=icycles This entry specifies the minimum number of consecutive cycles in which the number of processors changed does not exceed LSF_HPC_NCPU_INCREMENT.
Table 16-4 Environment Variables for LSF with SLURM Enhancement (lsb.queues File) Environment Variable Description DEFAULT_EXTSCHED= SLURM[options[;options]...] This entry specifies SLURM allocation options for the queue. The -ext options to the bsub command are merged with DEFAULT_EXTSCHED options, and -ext options override any conflicting queue-level options set by DEFAULT_EXTSCHED.
Table 16-4 Environment Variables for LSF with SLURM Enhancement (lsb.queues File) (continued) Environment Variable Description LSF with SLURM uses the resulting options for scheduling: SLURM[nodes=3;contiguous=yes;tmp=200] The nodes=3 specification in the -ext option overrides the nodes=2 specification in DEFAULT_EXTSCHED, and tmp= in the -ext option overrides the tmp=100 setting in DEFAULT_EXTSCHED.
1. If LSF with SLURM failover is enabled, ensure that each node with the resource management role has an external network connection. Run the following command to confirm this: # shownode roles --role resource_management external resource_management: n[127-128] external: n[125-128] resource_management: n[127-128] external: n[125-128] 2. Obtain a virtual IP and corresponding host name, and ensure that they are not already in use: # ping -i 2 -c 2 xclsf PING xclsf (10.10.123.
17 Managing Modulefiles This chapter describes how to load, unload, and examine modulefiles. Modulefiles provide a mechanism for accessing software commands and tools, particularly for third-party software. The HP XC System Software does not use modules for system-level manipulation. A modulefile contains the information that alters or sets shell environment variables, such as PATH and MANPATH. Some modulefiles are provided with the HP XC System Software and are available for you to load.
18 Mounting File Systems This chapter provides information and procedures for performing tasks to mount file systems that are internal and external to the HP XC system. It addresses the following topics: • “Overview of the Network File System on the HP XC System” (page 217) • “Understanding the Global fstab File” (page 217) • “Mounting Internal File Systems Throughout the HP XC System” (page 219) • “Mounting Remote File Systems” (page 224) 18.
Example 18-1 Unedited fstab.proto File # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead. # # How this file is organized: # # * Comments begin with # and continue to the end of line # # * Each non-comment line is a line that may be copied # to /etc/fstab verbatim.
The file systems can be either of the following: • External to the node, but internal to the HP XC system. “Mounting Internal File Systems Throughout the HP XC System” (page 219) describes this situation. The use of csys is strongly recommended. For more information, see csys(5). • External to the HP XC system. “Mounting Remote File Systems” (page 224) describes this situation. NFS mounting is recommended for remote file system mounting. 18.
Figure 18-1 Mounting an Internal File System HP XC Cluster . . . n59 n60 /scratch /dev/sdb1 n61 /scratch n62 /scratch n63 /scratch n64 . . . 18.3.1 Understanding the csys Utility in the Mounting Instructions The csys utility provides a facility for managing file systems on a systemwide basis. It works in conjunction with the mount and umount commands by providing a pseudo file system type. The csys utility is documented in csys(5).
options Specifies a comma-separated list of options, as defined in csys(5). The hostaddress and _netdev options are mandatory. • The hostaddress argument specifies the node that serves the file system to other nodes and specifies the network (administration or system interconnect) to be used. The hostaddress is specified either by its node name or by its IP address.
Note: The node that exports the file system to the other nodes in the HP XC system must have the disk_io role. 3. Determine whether you want to mount this file system over the administration network or over the system interconnect. As a general rule, specify the administration network for administrative data and the system interconnect for application data. 4. Edit the /hptc_cluster/etc/fstab.proto file as follows: a.
7. Verify the internal file system mounting by entering the following command, which ensures that the file system is mounted on the nodes: # cexec -a "mount | grep /scratch" n62: 192.168.0.60:/scratch on /scratch type nfs (rw,hard,intr,bg,rsize=8192,wsize=8192,addr=192.168.0.60) n61: 192.168.0.60:/scratch on /scratch type nfs (rw,hard,intr,bg,rsize=8192,wsize=8192,addr=192.168.0.60) n63: 192.168.0.60:/scratch on /scratch type nfs (rw,hard,intr,bg,rsize=8192,wsize=8192,addr=192.168.0.
Example 18-2 The fstab.proto File Edited for Internal File System Mounting # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead.
Figure 18-2 Mounting a Remote File System HP XC Cluster . . . n21 n22 External Server n23 xeno /extra n24 n25 . . . 18.4.1 Understanding the Mounting Instructions The syntax of the fstab entry for remote mounting using NFS is as follows: exphost:expfs mountpoint fstype options Specifies the external server that is exporting the file system. exphost The exporting host can be expressed as an IP address or as a fully qualified domain name.
18.4.2 Mounting a Remote File System Use the following procedure to mount a remote file system to one or more nodes in an HP XC system: 1. Determine which file system to export. In this example, the file system /extra is exported by the external server xeno. 2. Ensure that this file system can be NFS exported. Note: This information is system dependent and is not covered in this document. Consult the documentation for the external server. 3. 4. Log in as superuser on the head node.
Example 18-3 The fstab.proto File Edited for Remote File System Mounting # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead.
19 Managing Software RAID Arrays The HP XC system can mirror data on a RAID array. This chapter addresses the following topics: • • • • • • “Overview of Software RAID” (page 229) “Installing Software RAID on the Head Node” (page 229) “Installing Software RAID on Client Nodes” (page 229) “Examining a Software RAID Array” (page 230) “Error Reporting” (page 231) “Removing Software RAID from Client Nodes” (page 231) 19.
the cluster_config utility to update the system configuration database with node role assignments, start all services on the head node, and re-create or update the golden system image. Use the following procedure to install software RAID on client nodes: 1. 2. 3. Log in as superuser (root) on the head node. Generate a list of nodes on whose disks you will install the HP XC System Software with software RAID. Edit the /etc/systemimager/systemimager.
Update Time State Active Devices Working Devices Failed Devices Spare Devices : : : : : : date and time clean 2 2 0 0 Number 0 1 Major Minor RaidDevice State 8 1 0 active sync /dev/sda1 8 17 1 active sync /dev/sdb1 UUID : eead90a0:35c0bf46:9160b26b:2d754a4d Events : 0.10 Nagios uses the mdadm command to verify the status of the RAID array. 19.5 Error Reporting Errors can be reported during the installation of software RAID on a client node.
20 Using Diagnostic Tools This chapter discusses the diagnostic tools that the HP XC system provides. It addresses the following topics: • • • • “Using the sys_check Utility” (page 233) “Using the ovp Utility for System Verification” (page 233) “Using the dgemm Utility to Analyze Performance” (page 239) “Using the System Interconnect Diagnostic Tools” (page 240) Troubleshooting procedures are described in Chapter 21: Troubleshooting (page 247). 20.
• • • The administration network is operational. All application nodes are responding and available to run applications. The nodes in the HP XC system are performing optimally. These nodes are tested for the following: — CPU core usage — CPU core performance — Memory usage — Memory performance — Network performance under stress — Bidirectional network performance between pairs of nodes — Unidirectional network performance between pairs of nodes For a complete list of verification tests, see ovp(8).
Test list for license: file_integrity server_status Test list for SLURM: spconfig daemon_responds partition_state node_state Test list for LSF: identification hosts_static_resource_info hosts_status Test list for interconnect: myrinet/monitoring_line_card_setup Test list for nagios: configuration Test list for xring: xring (X) Test list for perf_health: cpu_usage memory_usage cpu memory network_stress network_bidirectional network_unidirectional Test list for myrinet_status: myrinet_status An 'X' indicates
By default, if any part of the verification fails, the ovp command ignores the test failure and continues with the next test. You can use the --failure_action option to control how the ovp command treats test failures. When you run the ovp command as superuser (root), it stores a record of the verification in a log file in the /hptc_cluster/adm/logs/ovp directory.
benchmark's Alltoall, Allgather, and Allreduce tests. Perform these tests on a large number of nodes for the most accurate results. The default value for the number of nodes is 4, which is the minimum value to use. The --all_group option enables you to select the node grouping size. network_bidirectional Tests network performance between pairs of nodes using the Pallas benchmark's Exchange test. network_unidirectional Tests network performance between pairs of nodes using the HP MPI ping_pong_ring test.
Testing cpu_usage ... The headnode is excluded from the cpu usage test. Number of nodes allocated for this test is 14 Job <102> is submitted to default queue . <> <>> All nodes have cpu usage less than 10%. +++ PASSED +++ This verification has completed successfully. A total of 1 test was run. Details of this verification have been recorded in: /hptc_cluster/lsf/home/ovp_n16_mmddyy.
/hptc_cluster/root/home/ovp_n16_mmddyy.tests Details of this verification have been recorded in: /hptc_cluster/root/home/ovp_n16_mmddyyr1.log 20.3 Using the dgemm Utility to Analyze Performance You can use the dgemm utility, in conjunction with other diagnostic utilities, to help detect nodes that may not be performing at their peak performance. When a processor is not performing at its peak efficiency, the dgemm utility displays a WARNING message.
2. Load the mpi/hp/default modulefile: # module load mpi/hp/default 3. Invoke the following command if you are superuser (root): # mpirun -prot -TCP -srun -v -p lsf -n max \ /opt/hptc/contrib/bin/dgemm.x Invoke the following command if you are not superuser: $ bsub -nmax -o ./ mpirun -prot -TCP -srun -v -n max \ /opt/hptc/contrib/bin/dgemm.x The max parameter is the maximum number of processors available to you in the lsf partition.
Fan speed The fan speed should be above the minimum. The gm_prodmode_mon diagnostic tool searches /etc/hosts for entries whose name matches the regular expression “MR0[NT][0–9][0–9]”. This command uses the links -dump command to obtain the current values and parses the output. The gm_prodmode_mon diagnostic tool generates an alert if any errors are found. All alerts are logged in the /var/log/messages file.
health to the network, either to a log file for generic usage or through a MySQL database (as is the case in HP XC systems). In addition to logging errors in the QsNet database, the swmlogger daemon also logs all errors to the /var/log/messages file. See the diagnostics section of the installation and operation guide for your model of HP cluster platform for additional information on the generic use of swmlogger.
-r rail Specifies the rail number. The default is 0. -clean Clears out the log file directory. This ensures that if the file already exists, the old data is deleted before the new test is run, to ensure that the data is fresh from the current run. HP recommends using this option. Specifies that you want to run this test only on a subset of nodes; the nodes parameter is a comma-separated list of nodes. The default is to run this test on all nodes.
-v Specifies verbose output, which is required to identify which component or location is causing errors. -t timeout Specifies the timeout value (in seconds), that is, the length to wait for any test to finish. The default value is 300. -N nodes Enables you to run the qsnet2_level_test on only a subset of nodes. The argument nodes is a comma-separated list, for example: n1,n2,n4. The default operation is to run the qsnet2_level_test utility on all nodes.
Killed Test ran on: n1,n2,n3 Parsing output level3: n1 - (NodeId = 4) ERROR: Test incomplete level3: n2 - (NodeId = 3) ERROR: Test incomplete level3: n3 - (NodeId = 2) ERROR: Test incomplete Parsing complete Example 4 The following example parses the output files created from a previous run of this command. This example specifies the log file directory created after unzipping and extracting the qsnet2_drain_test log file, which is described in the next section.
The output that ib_prodmode_mon produces identifies the bad links so that you can take corrective action.
21 Troubleshooting This chapter provides information to help you troubleshoot problems with HP XC systems.
21.1.2 Mismatched Secure Shell Keys If a node on your system has a mismatched Secure Shell (ssh) key, review the following list for the source of the problem: • The node was not imaged, and was booted an old image, which had older ssh keys. In this instance, it is the image, not the keys, that is out of synchronization. You can solve this problem by imaging the node properly and rebooting. • The keys were regenerated on the head node.
NOTE: Nagios runs only nodes with the management_server or management_hub roles. See “Messages Reported by Nagios” for additional information. 21.2.1 Determining the Status of the Nagios Service Use the following command to determine if Nagios is running properly: # pdsh -a "service nagios status" Nagios ok: located 1 process, status log updated 22 seconds ago Gathering status for nrpe ... n[3-8]NRPE v2.0 - n[3-8] Nagios nsca: n7: 0 data packet(s) sent to host successfully.
21.2.4 Running Nagios Plug-Ins Manually The Nagios plug-ins are located in the /opt/hptc/nagios/libexec directory. You can invoke them from the command line if needed. The following example shows the procedure to run the Nagios check_sel plug-in from the command line: Example 21-1 Running a Nagios Plug-In from the Command Line 1. 2. Log in as the nagios user. Proceed to the /opt/hptc/nagios/libexec directory. $ cd /opt/hptc/nagios/libexec 3.
normal, they indicate data has not yet been received by the Nagios engine. Service *may* be fine, but if it continues to pend for more then about 30 minutes it may indicate data is not being collected. n15 [NodeInfo - ASSUMEDOK] Pending services are normal, they indicate data has not yet been received by the Nagios engine. Service *may* be fine, but if it continues to pend for more then about 30 minutes it may indicate data is not being collected.
Service: Environment Status Information: Node sensor status A warning or critical message indicates that one or more monitored sensors reported that a threshold has been exceeded. Correct the condition. Service: Load Average Status Information: Node Load Ave: x/y/z QueLen: n A warning or critical message indicates that load average thresholds for the specific node have been exceeded. Thresholds can be set on a per-node, per-class, or per-system basis in the nagios_vars.ini file.
Service: Root key synchronization Status Information: Root SSH key synchronization status This entry provides the status of the root key synchronization. A warning or critical message indicates that the root ssh keys for one or more hosts are out of synchronization with the head node. The ssh and pdsh commands may not work for these nodes. Verify that the imaging is correct on the affected nodes.
Typically, this entry reports the number of alerts in a specified period of time and allows you to access the most recent log. A warning or critical message indicates that one or more rules defined in the /opt/hptc/nagios/ etc/syslogAlertRules file matches the specified node's consolidated log file. Take the appropriate action based on the message.
gm-2.1.7_Linux-2.1hptc m3-dist-1.0.14-1 mute-1.9.6-1 . . . The version numbers for your HP XC system may differ from these. 6. Run the lsmod command to display loaded modules. You should have one Myrinet GM loadable module installed. # lsmod | grep -i gm gm 589048 3 The size may differ from this output. 7. The Myrinet myri0 interface should be up. Use the ifconfig command to display the interface network configuration: # ifconfig myri0 myri0 Link encap:Ethernet HWaddr 00:60:DD:49:2D:DA inet addr:172.
qsqsql-1.0.12-2.2hpt . . . The version numbers for your HP XC system may differ from these. 5. Run the lsmod command to display loaded modules. You should have eight Quadrics loadable modules installed. # lsmod jtag eip ep rms elan4 elan3 elan qsnet 30016 86856 821112 48800 466352 606676 80616 101040 0 1 9 0 1 0 0 0 (unused) [eip] [ep] [ep] [ep elan4 elan3] [eip ep rms elan4 elan3 elan] The sizes may differ from this output. 6.
http://www.openfabrics.org/ To determine if your HP XC system is configured properly, perform the following steps on any node on which you suspect a problem: 1. Use the lspci command to ensure that your system has InfiniBand boards installed, and that the operating system detects them: [root@n1 ~]# lspci -v 44:00.
If you see no output at all, or if you see an error message, it is possible that the InfiniBand stack or kernel is improperly installed, see the following troubleshooting steps starting from #4 below to verify their proper installation. 3.
ofed-scripts-1.1-0.noarch.rpm openib-diags-1.1.0-0 openmpi_gcc-1.1.1-1 perftest-1.0-0 tvflash-0.9.0-0 ib-enhanced-services-0.9.0-1.1hptc vltmpi-OPENIB-1.0.0_20-1.1hptc The version numbers may differ, but it is important that the version of the kernel you are using should be present in the version number of the kernel-ib RPM. 6. Verify that the OFED InfiniBand loadable modules are installed: # lsmod ... rdma_ucm 15232 0 rdma_cm 28688 1 rdma_ucm ib_addr 10120 1 rdma_cm ib_ipoib 60440 0 ...
0:00:00:00:00:00:00 inet6 addr: fe80::000:0000:0000:000e/64 Scope:Link UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:22989 errors:0 dropped:0 overruns:0 frame:0 TX packets:35 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:1299245 (1.2 MiB) TX bytes:5572 (5.4 KiB) The interface (or interfaces) should be reported as UP and RUNNING.
21.5.3 Known Limitation if Nagios is Configured for Improved Availability When Nagios is configured for improved availability, by default, the head node acts as the Nagios master and the other node in the availability set acts as a Nagios monitor. The known limitation is that you cannot use the /sbin/service command to restart the Nagios monitor on the non-head node in the availability set because of new functionality provided by the /sbin/service script.
21.6 SLURM Troubleshooting The following section discusses SLURM troubleshooting in terms of configuration issues and run-time troubleshooting. 21.6.1 SLURM Configuration Issues SLURM consists of the following primary components: slurmctld a master/backup daemon. slurmd a slave daemon. Command binaries The sinfo, srun, scancel, squeue, and scontrol commands. slurm.conf The SLURM configuration file, /hptc_cluster/slurm/etc/ slurm.conf.
# /opt/hptc/etc/nconfig.d/C50gather_data 3. Run the spconfig command from the head node again. 21.6.2 SLURM Run-Time Troubleshooting The following describes how to overcome problems reported by SLURM while the HP XC system is running: Healthy node is down The most common reason for SLURM to list an apparently healthy node down is that a specified resource has dropped below the level defined for the node in the /hptc_cluster/ slurm/etc/slurm.conf file.
• • • • • • • • • Verify that the /hptc_cluster directory (file system) is properly mounted on all nodes. SLURM relies on this file system. Ensure that SLURM is configured, up, and running properly. Examine the SLURM log files in /var/slurm/log/ directory on the SLURM master node for any problems. If the sinfo command reports that the node is down and daemons are running, examine the available processors vs. Procs setting in the slurm.conf file. Ensure that the lsf partition is configured correctly.
At the next instance of the queue's RUN_WINDOW, the job resumes execution and the other jobs can be scheduled. Consider this example: 1. 2. 3. 4. 5. 6. 7. 8. Job #75 is scheduled on a queue named night. The RUN_WINDOW opens for the night queue. Job #75 runs on the night queue. The RUN_WINDOW for the night queue ends but Job #75 did not complete. Job #75 is suspended. Job #76 is scheduled on a higher priority queue named main but is suspended.
22 Servicing the HP XC System This section describes procedures for servicing the HP XC system. For more information, see the service guide for your cluster platform. This chapter addresses the following topics: • • • • • • “Adding a Node” (page 267) “Replacing a Client Node” (page 269) “Actualizing Planned Nodes” (page 270) “Replacing a System Interconnect Board in an HP CP6000 System” (page 273) “Software RAID Disk Replacement” (page 274) “Incorporating External Network Interface Cards” (page 277) 22.
DYNAMIC_DISK_PROCESSING = FALSE 4. Save the file and close the text editor. You must also modify the disk size parameters in the /opt/hptc/systemimager/etc/model.conf file on the head node (where model refers to the model type). 8. If your system is configured for improved availability, enter the transfer_from_avail command: # transfer_from_avail 9. Run the cluster_config utility to configure the nodes and set the imaging environment: # .
22.2 Replacing a Client Node The following procedure describes how to replace a faulty client node in an HP XC system. The example commands in the procedure use node n3. CAUTION: Do not use this procedure to replace the head node. The replacement node must have the identical (exact) hardware configuration to the node being replaced; the following characteristics must be identical: • • • Number of processors Memory size Number of ports 1.
Notes: The -oldmp option is also required for CP6000 systems because their management processors (MPs) have statically-set IP addresses and are not configured to use DHCP.
Use the following procedure to actualize planned nodes: CAUTION: • Ensure that the node you install has the same architecture as the nodes in the HP XC system. For example, if the HP XC system is made up of nodes in the CP6000 Cluster Platform, the actual nodes must also be in the CP6000 Cluster Platform. • This procedure makes use of the discover utility.
8. Run the following utility: # /opt/hptc/hpcgraph/sbin/hpcgraph-setup 9. Image the added nodes as follows: # startsys --image_and_boot nodelist Where nodelist is the list of the actualized nodes. 10. Verify the existence of the actualized nodes with any of the following utilities: • ovp • shownode The actualized nodes are enabled. Other planned nodes that were not actualized are shown as disabled. • sinfo The actualized nodes are shown as up.
6. Follow these steps to find the MAC address of the new Onboard Administrator. a. Connect a terminal device to the port of the Onboard Administrator. b. Log in to the Onboard Administrator using the administrator password you set in Step 5. c. Enter the show oa network command at the prompt: OA-#############> show oa network Onboard Administrator #1 Network Information DHCP: Enabled - Dynamic DNS IP Address: 172.31.32.3 Netmask: 255.255.192.0 Primary DNS: 172.31.15.
2. Ensure that no applications are running on the node with the faulty system interconnect board: # scontrol update NodeName=n3 State=DRAINING \ Reason="Removing node" 3. Use the stopsys command to shut down the node (on which the card is to be replaced) in an orderly way: # stopsys n3 4. If the faulty system interconnect board is on the head node, shut down the node with the shutdown command: # shutdown -h now WARNING! Verify that the power is disconnected to prevent injury or death. 5. 6. 7.
1. Examine the array.
7. 8. Partition the new disk. Add the new partitions back to their arrays: # mdadm /dev/md1 -a /dev/sdb1 # mdadm /dev/md2 -a /dev/sdb2 # mdadm /dev/md3 -a /dev/sdb3 The new partition begins synchronizing with the existing corresponding partition automatically. 9. Use the following two commands to update the mdadm configuration file, /etc/ mdadm.conf, which the mdadm command uses to manage the RAID arrays.
1. Use the systemconfigurator command as follows: # /usr/bin/systemconfigurator -runboot -stdin <
• • • “Reconfiguring the Nodes” (page 289) “Verifying Success” (page 289) “Updating the Golden Image” (page 290) 22.7.1 Gathering Information You need to gather information on the nodes, the NIC, and the network to incorporate an external NIC. This section discusses how to acquire that information and provides a worksheet you can use to note the settings for your system.
Complete the corresponding portions of Table 22-1 (page 281) with the information from this section. 22.7.1.2 Determining NIC-Specific Information For most model types, you need to know the following Ethernet interface data for the NIC: • the PCI bus ID NOTE: • • The PCI bus ID does not apply to the model type rx8620 server.
product: NetXtreme BCM5721 Gigabit Ethernet PCI Express vendor: Broadcom Corporation physical id: 0 bus info: pci@03:00.0 logical name: eth1 version: 11 serial: 00:00:00:00:00:01 description: Ethernet interface product: NetXtreme BCM5703 Gigabit Ethernet vendor: Broadcom Corporation physical id: 1 bus info: pci@05:01.0 logical name: eth2 version: 10 serial: 00:00:00:00:00:02 ... The dd variable was used in this example to denote the last portion of the MAC address.
Complete the corresponding portions of Table 22-1 (page 281) with the information from this section. 22.7.1.3 Gathering Networking Information The following networking information is required to configure the NIC: • • • • • • The external host name for the node The external IP address Optionally, the external IPv6 address The netmask The gateway, that is, the system that acts as a gateway to external communications Optionally, the largest packet size (MTU).
Administration Port, the Interconnect port, and the External port. These values can be stated as Ethernet device names, PCI bus IDs, and the literal strings undef and offboard. The following is an example of an entry for the HP Integrity rx2600 in the [type modeltype] section of the platform_vars.
Table 22-2 Modelmap Values (continued) Column Values Interconnect The port specified in this column is used for the system interconnect. Valid values for this column are: Bus_ID1 Indicates the hardware PCI bus ID, for example, 20:02.1; this is the most reliable method for designating a physical Ethernet port. ethn Indicates an Ethernet device, starting with eth0. undef Indicates this value is undefined for interconnect port in this configuration.
7. Change the text in the External column accordingly: • If the entry in the External column is offboard, change that text to the PCI bus ID of the first added NIC. Before: After: offboard 06:01.0 If there is more than one NIC, add a comma character (,) then enter the PCI bus ID of the next added NIC. For example: Before: After: offboard 06:01.0,06:01.1 Repeat as necessary. IMPORTANT: The list to specify multiple PCI bus IDs must be a comma-separated list without space characters.
External2 eth3 IMPORTANT: Be sure that the output represents the results you want before proceeding to the next task, Otherwise, repeat the procedure in this section until the output accurately represents the results you want. 22.7.3 Using the device_config Command The device_config command allows you to update the command and management database (CMDB) to incorporate one or more new external Ethernet ports, external1, external2, and so on.
IMPORTANT: configure. You need to repeat this procedure for each external Ethernet port that you 1. Review the information gathered from Table 22-1 (page 281). You will need the following information for the device_config command: • Node name • External device/port • IP address • External host name • Netmask • Gateway • MAC address 2. Enter the device_config command with the --dryrun option to perform a practice run.
NOTE: If you are using IPv6, you need to configure the /etc/sysconfig/ip6tables.proto file. The method for doing so is analogous to configuring the iptables.proto file. If a service is not aware of the external physical Ethernet port, it will not be able to communicate through its corresponding virtual ports unless you custom configure the firewall. As shipped, the firewall prototype file, /etc/sysconfig/iptables.
1 This line opens virtual port 443 for TCP on the first added physical external Ethernet port, External1, on all nodes in the HP XC system. The text -i External1 matches all nodes, so virtual port 443 will be open on all nodes with External1 connections. 3 This line opens the ftp virtual ports (20 and 21) on the first added physical external Ethernet port, External1, on node n19.
ipaddr: ipv6addr: mtu: name: netmask: Interconnect: device: gateway: hwaddr: iftype: ifusage: interface_number: ipaddr: ipv6addr: mtu: name: netmask: 192.0.2.2 station2.example.com 255.255.248.0 ipoib0 Infiniband Interconnect 172.22.0.16 ic-n19 255.255.0.0 . . . 22.7.6 Reconfiguring the Nodes After editing the platform_vars.ini file and updating the database, you need to reconfigure the nodes with the new NICs. Use the following procedure: 1. 2. Log in as superuser (root) on the head node.
22.7.7.1 Verifying the Ethernet Port Use the ifconfig command to verify that the Ethernet port for the NIC that you incorporated into the HP XC system is functioning correctly: # ifconfig eth2 eth2 Link encap:Ethernet HWaddr 00:00:00:00:00:02 inet addr:192.0.2.2 Bcast:192.0.2.100 Mask:255.255.248.
A Installing LSF with SLURM into an Existing Standard LSF Cluster This appendix describes how to join an HP XC system running LSF with SLURM (integrated with the SLURM resource manager) to an existing standard LSF Cluster without destroying existing LSF with SLURM configuration. After installation, the HP XC system is treated as one host in the overall LSF cluster, that is, it becomes a cluster within the LSF cluster.
• • • • • You should be familiar with the LSF installation documentation and the README file provided in the LSF installation tar file. You should also be familiar with the normal procedures in adding a node to an existing LSF cluster, such as: — Establishing default communications (rhosts or ssh keys) — Setting up shared directories — Adding common users The Standard LSF cluster is not configured with EGO (that is, the ENABLE_EGO environment variable is set to N during the installation of Standard LSF).
Before proceeding, read through all these steps first to ensure that you understand what is to be done: 1. Log in as superuser (root) on the head node of the HP XC system. 2. Make sure any current LSF application on the HP XC system is shut down and won't interfere. If LSF with SLURM is currently installed and running on the HP XC system, shut it down with the controllsf command: # controllsf stop 3. Consider removing this installation to avoid confusion: # /opt/hptc/etc/gconfig.
Use the shownodes command to ensure that each node configured as a resource management node during the operation of the cluster_config utility also has access to the external network: # shownode roles --role resource_management external resource_management: xc[127-128] external: xc[125-128] If this command is not available, examine the role assignments by running the cluster_config command and viewing the node configurations. Be sure to quit after you determine the configuration of the nodes.
7. Modify the Head Node. These steps modify the head node and propagate those changes to the rest of the HP XC system. The recommended method is to use the updateimage command as documented in Chapter 11: Distributing Software Throughout the System (page 141). Make the modifications first, then propagate the following changes: a. Lower the firewall on the HP XC external network. LSF daemons communicate through pre-configured ports in the lsf.
fi esac # cat lsf.csh if ( "${path}" !~ *-slurm/etc* ) then if ( -f /opt/hptc/lsf/top/conf/cshrc.lsf ) then source /opt/hptc/lsf/top/conf/cshrc.lsf endif endif The goal of these custom files is to source (only once) the appropriate LSF environment file: $LSF_ENVDIR/cshrc.lsf for csh users, and $LSF_ENVDIR/profile.lsf for users of sh, bash, and other shells based on sh. Create /etc/profile.d/lsf.sh and /etc/profile.d/lsf.csh on the HP XC system to set up the LSF environment on HP XC.
d. The HP XC controllsf command can double as the Red Hat /etc/init.d/ service script for starting LSF with SLURM when booting the HP XC system and stopping LSF with SLURM when shutting it down. When starting LSF with SLURM, the controllsf command establishes the LSF alias and starts the LSF daemons.
2. Ensure that head node and resource management nodes resolve the host name of Standard LSF head node, and vice versa. Also, make sure LSF alias entry is present on head nodes of both clusters The plain node should have following entries in its /etc/hosts file: 192.0.2.128 192.0.2.140 xc-head xclsf The xc-head head node should have the following entries in its /etc/hosts file: 192.0.1.24 192.0.2.140 3.
5. Start the LSF installation process: # ./lsfinstall -f xc.config Logging installation sequence in /shared/lsf/hpctmp/lsf7Update3_lsfinstall/Install.log LSF pre-installation check ... Checking the LSF TOP directory /shared/lsf ... ... Done checking the LSF TOP directory /shared/lsf ... You have specified ***none*** to LSF_LICENSE, lsfinstall will continue without installing a license. Checking LSF Administrators ...
lsfinstall is done. To complete your LSF installation and get your cluster "stdlsfclus" up and running, follow the steps in "/shared/lsf/hpctmp/lsf7Update3_lsfinstall/lsf_getting_started.html". After setting up your LSF server hosts and verifying your cluster "stdlsfclus" is running correctly, see "/shared/lsf/7.0/lsf_quick_admin.html" to learn more about your new LSF cluster. After installation, remember to bring your cluster up to date by applying the latest updates and bug fixes.
c. Make sure the LSF_NON_PRIVILEGED_PORTS option is disabled or removed from this file ('N' by default). In Standard LSF Version 7.3 this is not supported, and you will get "bad port" messages from the sbatchd and mbatchd daemons on a non-HP XC system node. d. If you use ssh for node-to-node communication, set the following variable in lsf.conf (assuming the ssh keys have been set up to allow access without a password): LSF_RSH=ssh e. Add RLA port details. LSB_RLA_PORT=6883 f.
# controllsf show LSF is currently shut down, and assigned to node . Failover is disabled. Head node is preferred. The primary LSF host node is xc128. SLURM affinity is enabled. The virtual hostname is "xclsf". A.8 Starting LSF on the HP XC System At this point lsadmin reconfig followed by badmin reconfig can be run within the existing LSF cluster (on plain in our example) to update LSF with the latest configuration changes.
A.9 Sample Running Jobs Example A-1 Running Jobs as a User on an External Node Launching to a Linux Itanium Resource $ bsub -I -n1 -R type=LINUX86 hostname Job <411> is submitted to default queue . <> <> plain Example A-2 Running Jobs as a User on an External Node Launching to an HP XC Resource $ bsub -I -n1 -R type=SLINUX64 hostname Job <412> is submitted to default queue . <
A.10 Troubleshooting • • • • • • Use the following commands to verify your configuration changes: — iptables -L and other options to confirm the firewall settings — pdsh -a 'ls -l /etc/init.d/lsf' to confirm startup script — pdsh -a 'ls -ld /shared/lsf/' (using our running example) to confirm that the LSF tree was properly mounted — pdsh -a 'ls -l /etc/profile.d/lsf.
B Setting Up MPICH MPICH, as described on its web address, http://www-unix.mcs.anl.gov/mpi/mpich1/, is a freely available portable implementation of MPI. This appendix provides the information you need to set up MPICH on an HP XC system.
NOTE: 6. Be sure to specify the directory with the -prefix= option. Build MPICH with the make command. # make NOTE: Building MPICH may take longer than 2 hours. B.3 Running the MPICH Self-Tests Optionally, you can run the MPICH self-tests with the following command: % make testing Two Fortran tests are expected to fail because they are not 64-bit clean. Tests that use ADIOI_Set_lock() fail on some platforms as well, for unknown reasons. B.
C HP MCS Monitoring You can monitor the optional HP Modular Cooling System (MCS) by using the Nagios interface. During HP XC system installation, you generated an initialization file, /opt/hptc/config/ mcs.ini, which specifies the names and IP addresses of the MCS devices. This file is used in the creation of the /opt/hptc/nagios/etc/mcs_local.cfg file, which Nagios uses to monitor the MCS devices.
a. Issue the following command: # /opt/hptc/config/sbin/mcs_config 6. 7. 8. b. Restart Nagios. For more information, see “Stopping and Restarting Nagios” (page 115). Examine the /opt/hptc/config/mcs_advExpected.static.db file to ensure that the values for the MCS advanced setting are appropriate for your site. Restart Nagios if you changed this file. For more information, see “Stopping and Restarting Nagios” (page 115).
C.4 MCS Log Files The following log files contain MCS-related data collected by the check_mcs_trends plug-in: • /opt/hptc/nagios/var/mcs_trends.staticdb This log file tracks the following: — tempWaterIn — waterFlow — lastStatus — lastCheck • /opt/hptc/nagios/var/env_logs/mcs_trends.log This log file tracks the following: — Hex1InTemp — Hex1OutTemp — Hex2InTemp — Hex2OutTemp — Hex3InTemp — Hex3OutTemp — waterInTemp — waterOutTemp — waterFlowRate — waterFlowRetries In this file, the data is stored in a row.
Figure C-1 MCS Hosts in Nagios Service Details Window 310 HP MCS Monitoring
D CPU Frequency-Based Power-Saving Feature This version of the HP XC System Software introduces the CPU frequency-based power-saving feature, which adjusts the CPU clock frequency to save CPU power consumption. This feature sets the node's CPU clock frequency to its minimum frequency when it is idle and sets it to its highest setting when it runs a Standard LSF job. This feature is enabled only for HP XC systems that fulfill the following conditions: • HP CP3000BL and CP4000BL platform servers.
Glossary A administration branch The half (branch) of the administration network that contains all of the general-purpose administration ports to the nodes of the HP XC system. administration network The private network within the HP XC system that is used for administrative operations. availability set An association of two individual nodes so that one node acts as the first server and the other node acts as the second server of a service. See also improved availability, availability tool.
operating system and its loader. Together, these provide a standard environment for booting an operating system and running preboot applications. enclosure The hardware and software infrastructure that houses HP BladeSystem servers. extensible firmware interface See EFI. external network node A node that is connected to a network external to the HP XC system. F fairshare An LSF job-scheduling policy that specifies how resources should be shared by competing users.
image server A node specifically designated to hold images that will be distributed to one or more client systems. In a standard HP XC installation, the head node acts as the image server and golden client. improved availability A service availability infrastructure that is built into the HP XC system software to enable an availability tool to fail over a subset of eligible services to nodes that have been designated as a second server of the service See also availability set, availability tool.
LVS Linux Virtual Server. Provides a centralized login capability for system users. LVS handles incoming login requests and directs them to a node with a login role. M Management Processor See MP. master host See LSF master host. MCS An optional integrated system that uses chilled water technology to triple the standard cooling capacity of a single rack. This system helps take the heat out of high-density deployments of servers and blades, enabling greater densities in data centers.
onboard administrator See OA. P parallel application An application that uses a distributed programming model and can run on multiple processors. An HP XC MPI application is a parallel application. That is, all interprocessor communication within an HP XC parallel application is performed through calls to the MPI message passing library. PXE Preboot Execution Environment.
an HP XC system, the use of SMP technology increases the number of CPUs (amount of computational power) available per unit of space. ssh Secure Shell. A shell program for logging in to and executing commands on a remote computer. It can provide secure encrypted communications between two untrusted hosts over an insecure network. standard LSF A workload manager for any kind of batch job.
Index Symbols A actualizing planned nodes, 270–272 adding a local user account, 159 adding a node, 267 adding a service, 74 administrative passwords, 162–167 archive.
HP XC commands in, 33 D database cannot connect, 247 dbsysparams command, 31, 81, 132 deadman module, 139 deleting a local user account, 161 device_config command, 31, 285 dgemm utility, 42, 239 DHCP service, 25 diagnostic tools dgemm, 239 Gigabit Ethernet system interconnect, 246 gm_drain_test, 241 gm_prodmode_mon, 240 InfiniBand system interconnect, 245 Myrinet system interconnect, 240 ovp, 233 qsdiagadm, 242 qselantestp, 242 qsnet2_drain_test, 245 qsnet2_level_test, 243 Quadrics system interconnect, 241
hostgroup command, 31 HowTo web address, 17 HP BladeSystems information, 81 HP documentation providing feedback for, 21 HP Graph, 95–99 HP Serviceguard, 47–49 HP XC command set, 30 configuration file guidelines, 35 HP XC system booting, 51 file system hierarchy, 26 log files, 29 shutdown, 54 startup, 51 hpasm, 87 /hptc_cluster directory, 27, 58, 146, 262, 264 guidelines, 27 troubleshooting mount failure, 248 I I/O service, 25 image replication and distribution, 141 exclusion files, 150 image server service
monitoring, 203 resource information, 202 short RUN_WINDOW for queue, 264 shutting down, 197 starting up, 196 troubleshooting, 263–265 LSF with SLURM failover, 204, 264 running jobs, 205 LSF with SLURM integration, 190 LSF with SLURM interplay, 212 LSF with SLURM jobs controlling, 198 monitoring, 198 lsf.
server, 131 service, 25 nconfigure script defined, 63 Network Address Translation (see NAT) network bandwidth, 95 network boot, 149, 150 Network File System (see NFS) Network Information Service (see NIS) Network Interface Cards (see NICs) Network Time Protocol (see NTP) NFS, 217 attribute caching, 248 mount options, 248 RPC services, 217 troubleshooting mount failure, 248 nicbond.
RPM updates, 137 RRDtool, 95 rsync utility, 51 S sacct command, 32, 179 /sbin directory, 26 sconfigure script defined, 63 scontrol command, 100, 182, 263 scp command, 41 secure shell, 41 security, 41 security patches, 137 Self-Monitoring Analysis and Reporting Technology (see SMART) sendmail utility, 117 sensor thresholds changing, 117 server blades information, 81 service, 23, 57 adding, 74 central control daemon service, 24 compute service, 24 configuration and management database, 25 configuration files
ssh keys, 161 troubleshooting mismatched, 248 ssh_create_shared_keys command, 32, 161 stale metrics data troubleshooting, 248 startsys command, 32, 52, 53, 54, 56, 147, 151, 264, 268, 270, 274 stopsys command, 32, 54, 151 striping, 229 Supermon, 58, 59, 85, 86, 105 supermond service, 59 superuser password changing, 162 swmlogger daemon, 241 sys_check utility, 32, 42, 233 syslog service, 58, 86, 90 syslog-ng configuration files, 37 syslog-ng rules files modifying, 91 templates, 91 syslog-ng service, 86 syslo
*A-XCADM-40b* Printed in the US