HP XC System Software Administration Guide Version 3.
© Copyright 2003, 2004, 2005, 2006 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Table of Contents About This Document.......................................................................................................17 1 Intended Audience................................................................................................................................17 2 New and Changed Information in This Edition........................................................................................17 3 Typographic Conventions..........................................................
2.3 Availability Sets..................................................................................................................................44 2.3.1 Determining Which Nodes Are in the Availability Set....................................................................44 2.3.2 Reconfiguring an Availability Set..................................................................................................44 2.4 HP Serviceguard Tasks.................................................................
.3.3 Restarting the License Manager....................................................................................................76 6 Managing the Configuration and Management Database....................................77 6.1 Accessing the Configuration and Management Database.......................................................................77 6.2 Querying the Configuration and Management Database........................................................................77 6.2.
8.2.4 Using the Nagios Service Problems View....................................................................................107 8.3 Adjusting the Nagios Configuration...................................................................................................109 8.3.1 Stopping and Restarting Nagios.................................................................................................110 8.3.2 Updating the Nagios Configuration...............................................................
13 Managing Local User Accounts and Passwords...................................................149 13.1 HP XC User and Group Accounts.....................................................................................................149 13.2 General Procedures for Administering Local User Accounts...............................................................149 13.3 Adding a Local User Account...........................................................................................................
15.6 Controlling the LSF-HPC with SLURM Service..................................................................................184 15.7 Launching Jobs with LSF-HPC with SLURM.....................................................................................184 15.8 Monitoring and Controlling LSF-HPC with SLURM Jobs....................................................................185 15.9 Job Accounting.....................................................................................................
20 Troubleshooting.......................................................................................................229 20.1 General Troubleshooting..................................................................................................................229 20.1.1 Mismatched Secure Shell Keys..................................................................................................229 20.2 Nagios Troubleshooting.............................................................................
D.3 Useful Administrative Commands.....................................................................................................274 D.4 MCS Log Files..................................................................................................................................275 D.5 Nagios Plug-Ins for MCS...................................................................................................................275 Glossary...................................................................
List of Figures 1-1 1-2 1-3 3-1 7-1 7-2 7-3 7-4 7-5 8-1 8-2 8-3 8-4 8-5 8-6 8-7 8-8 9-1 17-1 17-2 20-1 A-1 B-1 D-1 HP XC File System Hierarchy.............................................................................................................27 HP XC Hierarchy Under /opt/hptc......................................................................................................29 LVS View of Cluster......................................................................................................
List of Tables 1-1 1-2 1-3 1-4 3-1 4-1 4-2 4-3 4-4 8-1 8-2 8-3 11-1 11-2 12-1 13-1 14-1 14-2 14-3 14-4 15-1 15-2 15-3 15-4 17-1 A-1 Log Files............................................................................................................................................30 HP XC System Commands.................................................................................................................31 HP XC Configuration Files.................................................................
List of Examples 4-1 Sample gconfig Script: Client Selection and Client-to-Server Assignment.....................................................65 4-2 Sample service.ini FIle..............................................................................................................................69 7-1 Using the collectl Utility from the Command Line......................................................................................90 8-1 The nrg Utility System State Analysis..........................
About This Document This document describes the procedures and tools that are required to maintain the HP XC system. It provides an overview of the administrative environment and describes administration tasks, node maintenance tasks, Load Sharing Facility (LSF®) administration tasks, and troubleshooting information. An HP XC system is integrated with several open source software components. Some open source software components are being used for underlying technology, and their deployment is transparent.
• • • • • • • • New section on switching the type of LSF installed. New chapter on software RAID. The section on the ovp diagnostic tool is expanded for the new Performance Health tests. New section for General Troubleshooting New section for Nagios Troubleshooting New section on software RAID disk replacement. New appendix for MPICH, a freely available, portable implementation of the message passing libraries. New appendix for Modular Cooling System (MCS) monitoring.
The HP XC System Software Documentation Set includes the following core documents: HP XC System Software Release Notes Describes important, last-minute information about firmware, software, or hardware that might affect the system. This document is available only on line. HP XC Hardware Preparation Guide Describes hardware preparation tasks specific to HP XC that are required to prepare each supported hardware model for installation and configuration, including required node and switch connections.
HP Integrity and HP ProLiant Servers Documentation for HP Integrity and HP ProLiant servers is available at the following Web site: http://www.docs.hp.com/en/hw.html 5 Related information Supplementary Software Products This section provides links to third-party and open source software products that are integrated into the HP XC System Software core technology.
• http://supermon.sourceforge.net/ Home page for Supermon, a high-speed cluster monitoring system that emphasizes low perturbation, high sampling rates, and an extensible data protocol and programming interface. Supermon works in conjunction with Nagios to provide HP XC system monitoring. • http://www.llnl.gov/linux/pdsh/ Home page for the parallel distributed shell (pdsh), which executes commands across HP XC client nodes in parallel. • http://www.balabit.
MPI Web Sites • http://www.mpi-forum.org Contains the official MPI standards documents, errata, and archives of the MPI Forum. The MPI Forum is an open group with representatives from many organizations that define and maintain the MPI standard. • http://www-unix.mcs.anl.gov/mpi/ A comprehensive site containing general information, such as the specification and FAQs, and pointers to a variety of other resources, including tutorials, implementations, and other MPI-related sites.
$ man discover $ man 8 discover If you are not sure about a command you need to use, enter the man command with the -k option to obtain a list of commands that are related to a keyword. For example: $ man -k keyword 7 HP Encourages Your Comments HP encourages comments concerning this document. We are committed to providing documentation that meets your needs. Send any errors found, suggestions for improvement, or compliments to: feedback@fc.hp.
1 HP XC Administration Environment This chapter introduces the HP XC Administration Environment.
1.1.1.2 Local Storage The local storage for each node holds the operating system, a copy of the HP XC system software, and temporary space that can be used by jobs. When possible, ensure that jobs that use local storage clean up files after they are run. You might need to clean up temporary storage on local machines if jobs do not do so adequately. 1.1.2 Services A service is the software that runs on a node to provide a given function.
The node assigned this service is the MySQL configuration database server for this system. For additional information on the configuration database, see “Configuration and Management Database” (page 34) . • Dynamic Host Control Protocol (DHCP) server This service assigns IP addresses for all devices on the internal network based on the DHCP server node's Machine Access Control (MAC) address.
/etc /hptc_cluster Contains files for the configuration of the system and components of the system, including networking information, printers, and so on. Reserved for the exclusive use of the HP XC System Software. This directory is a file system that is mounted on all the nodes in the HP XC system. This systemwide directory contains key directories and files for global system use. See “Systemwide Directory, /hptc_cluster” (page 28) for more information.
This directory is mounted on the head node. You must ensure the persistence of this file system mount. Caution: Do not add, replace, or remove any files in the /hptc_cluster directory. Doing so will cause the HP XC system to fail. 1.2.1.2 HP XC System Software Directory, /opt/hptc The HP XC System Software maintains the /opt/hptc directory for its exclusive use. Its software is installed in that directory.
1.2.1.3 HP XC Service Configuration Files The /opt/hptc/etc/ directory includes several subdirectories containing scripts used to configure services on nodes at installation time. The /opt/hptc/etc/sconfig.d directory contains scripts for system configuration. The /opt/hptc/etc/gconfig.d directory contains scripts used to gather information needed to configure a service on the HP XC system. The /opt/hptc/etc/nconfig.
1.3.1 HP XC Command Set Table 1-2 lists the commands alphabetically and provides a brief description. For more information on an individual command, see the corresponding manpage. Table 1-2 HP XC System Commands Command Description cexec The cexec command is a shell script that invokes the pdsh command to perform commands on multiple node in the system. Manpage: cexec(1) clusplot The clusplot utility graphs the data from the data files generated with the xcxclus utility.
Table 1-2 HP XC System Commands (continued) Command Description ovp Use the ovp utility to verify the installation, configuration, and operation of the HP XC system. Manpage: ovp(8) perfplot The perfplot utility graphs the data from the data files generated with the xcxperf utility. This utility is described in the HP XC System Software User's Guide. Manpage: perfplot(1) power Use the power command to control the power for a set of nodes and to interrogate their current state.
1.3.2 The nodelist Parameter The nodelist parameter, used in several HP XC system commands, indicates one or more nodes. You can use brackets, hyphens, and commas in the nodelist parameter: [ ] , Brackets indicate a set of nodes. You can use only one set of brackets for each instance of the nodelist parameter. A hyphen indicates a range of nodes. A comma separates node definitions. For example, the nodes n1 n2 n3 n5 can be expressed in the nodelist parameter as n[1-3,5].
For additional information, see pdsh(1). You can also find additional information at the following Web site: http://www.llnl.gov/linux/pdsh/ 1.3.3.2 Using the cexec Command The cexec command provides the same facility as the pdsh command plus two additional features: • The cexec command provides an additional option, which enables you to specify a host group or a service group: — A host group is a designated list of nodes.
• • • • • • • Processors MAC addresses IP addresses Switch ports Services Roles Metrics The following custom management tools access the data stored in the CMDB: • • • The shownode command searches the database and displays a list of services defined for a given parameter. The dbsysparams command searches the database and displays the value of the given attribute. The managedb command enables you to archive metrics data from the CMDB, back it up, and dump (display) the CMDB data.
Software Upgrade During a software upgrade, the HP XC System Software defers to the Linux rpm software, which has the responsibility for ensuring the preservation of any Linux configuration files you have customized. For more information, see rpm(8). 1.5.
Table 1-3 HP XC Configuration Files (continued) Component Referenced in Configuration Files Services Chapter 4 (page 55) /opt/hptc/config/roles_services.ini /opt/hptc/config/etc/services/*.ini SLURM Chapter 14 (page 157) /hptc_cluster/slurm/etc/slurm.
1.6.2 Software Distribution The HP XC system uses the SystemImager tool to synchronize the configuration of nodes across the cluster using image propagation. This simplifies installation of the initial software and upgrading software and configuration files. For more information, see Chapter 10: Distributing Software Throughout the System (page 129). 1.
If something goes wrong and you notice a problem with the clocks synchronizing on any nodes, verify the internal server's /etc/ntp.conf file and the ntp.conf file on the nodes that are experiencing the problem. Other tools, such as ntpq and ntpdc, are also available. For more information, see ntpd(1), ntpq(1), and ntpdc(1) and the ntp.conf file. 1.8.
# module unload package-name See the HP XC System Software User's Guide for more information about modulefiles. Notes: Installing a package in a nondefault location means that you must update the corresponding modulefile; you might need to edit the PATH and MANPATH environment variables. Other changes are based on the software package and its dependencies. If you have installed a variant of the package, you might need to create a parallel modulefile specifically for the variant.
For security purposes, use the ssh command instead of these other, much less secure alternatives. To use ssh without password prompting with a user account, you must set up ssh authentication keys. For information on configuring the ssh keys, see “Configuring the ssh Keys for a User” (page 151). 1.11 Recommended Administrative Tasks Table 1-4 lists recommended administrative tasks. Consider these recommendations when establishing a site policy for administering your HP XC system.
2 Improved Availability The improved availability feature of the HP XC system offers the following benefits: • • It enables services and, thus, user jobs, to continue to run, even after a node failure. It enables you to run new jobs. The improved availability feature relies on an availability tool controlling nodes and services in an availability set. The HP XC System Software provides commands to transfer control of services to the availability tool.
2.3 Availability Sets A set of nodes is designated as an availability set during the configuration of the HP XC System Software. These nodes provide failover and failback functionality. One node in the availability set is typically the primary provider for the service. When the availability tool senses that the node has failed, the other node in the availability set assumes the function of that service for the same availability tool.
NOTE: In the examples in this section, assume the PATH environment variable has been updated for Serviceguard commands. 2.4.1 Viewing the Serviceguard Cluster Status View the status of the Serviceguard cluster (that is, the availability set). The Serviceguard cmviewcl command is a key component of this procedure. NOTE: The /usr/local/cmcluster/bin directory is the default location of the cmviewcl command. If you installed Serviceguard in a location other than the default, look in the /etc/cmcluster.
n14: n14: nagios.n16 dbserver.n16 up up running running enabled enabled n16 n16 The cmviewcl command views the status of all Serviceguard clusters that correspond to the avail1 and avail2 availability sets. 2.4.2 Moving Packages The following example describes how to relocate packages manually within a Serviceguard cluster (availability set). You can use the steps shown in this example to failback HP Serviceguard packages manually after an automatic failover. 1. 2.
2.5 Transferring Control of Services The HP XC System Software provides these commands to transfer control of services to and from the availability tool: • transfer_to_avail The transfer_to_avail command transfers the control of services from the HP XC system to the availability tool. • transfer_from_avail The transfer_from_avail command transfers the control of services from the availability tool back to the HP XC system. IMPORTANT: You must issue these commands from the head node.
restoreServices: ========== '/usr/bin/ssh nh /opt/hptc/etc/nconfig.d/C12dbserver nxferfrom n16' finished, exited with 0(0) restoreServices: Info: DB up and running restoreServices: ========== Executing '/opt/hptc/bin/pdsh -S -w n16 /opt/hpt c/etc/nconfig.d/C50nat nxferfrom n16'... restoreServices: ========== '/opt/hptc/bin/pdsh -S -w n16 /opt/hptc/etc/ncon fig.d/C50nat nxferfrom n16' finished, exited with 0(0) restoreServices: ========== Executing '/opt/hptc/bin/pdsh -S -w n14 /opt/hpt c/etc/nconfig.
3 Starting Up and Shutting Down the HP XC System This chapter addresses the following topics: • • • • • • • “Understanding the Node States” (page 49) “Starting Up the HP XC System” (page 50) “Shutting Down the HP XC System” (page 52) “Shutting Down One or More Nodes” (page 52) “Determining a Node's Power Status” (page 52) “Locating a Given Node” (page 53) “Disabling and Enabling a Node” (page 53) 3.
Figure 3-1 Transition of Node States A node that does not require imaging is powered on in the POST state, enters the Booting state when it is booting, and is in the Available state when it is ready for use in the HP XC system. A node that requires imaging is considered to be in the Raw_Off state until it is powered on. Then it enters the POST state. When the node is imaged, it is in the Imaging state. After completing imaging, the node enters the Boot_Ready state and is returned to the POST state.
For additional information about the startsys command, see startsys(8). The following sections provide procedures for starting nodes that do not need imaging, for imaging and starting nodes, and for restarting a node to image it. 3.2.1 Starting All Nodes The following procedure describes the steps necessary to start the HP XC system from the head node. In this procedure, all the enabled nodes that do not require imaging are booted.
# startsys --force --image_and_boot node 3.3 Shutting Down the HP XC System The stopsys command implements the actions to be taken to shut down the system. The following procedure describes the steps necessary to shut down the HP XC system: 1. Log in to any node as superuser. Logging in to the head node is the preferred node for shutting down the system. 2.
# shownode status n1 n2 n2 ON n1 ON You can request the power status of all nodes by omitting the nodelist parameter. This returns all the nodes whose status is either ON or OFF: # shownode status . . . n2 ON n1 ON 3.6 Locating a Given Node Most nodes supported on the HP XC system have a Unit Identifier LED that can be lit; not all nodes have this feature. Follow this procedure to light the Unit Identifier LED on a node with this feature, given its nodename: 1. 2. Log in as superuser.
4 Managing and Customizing System Services This chapter describes the HP XC system services and the procedures for their use. This chapter addresses the following topics: • • • • • • • “HP XC System Services” (page 55) “Displaying Services Information” (page 57) “Restarting a Service” (page 59) “Stopping a Service” (page 60) “Adding a New Service” (page 71) “Global System Services” (page 60) “Customizing Services and Roles” (page 60) 4.
Table 4-1 Linux and third-party System Services (continued) Service Function Database Name LVS Director Handles the placement of user login sessions on nodes lvs when a user logs in to the cluster alias. NAT Server Network Address Translation server. nat NAT Client Network Address Translation client. nat_client Network Adapter Setup Gathers information about all network adapters and stores it in the management database. network NFS Server Runs an NFS server for sharing file systems.
Management hubs are used to aggregate system information for a set of local nodes. Each management hub runs the following services: • Supermon aggregator (supermond) • Syslog-ng aggregator (syslogng_forward) • Console Management Facility (CMF) • Nagios (nagios_monitor) “Displaying Services Information” (page 57) describes commands that provide information on services, that is, how to display a list of services, which services are provided on a given node, and which nodes provide a given service. 4.
# service --status-all 4.2.2 Displaying the Nodes That Provide a Specified Service You can use the shownode servers command to display the node or nodes that provide a specific service to a given node. You do not need to be superuser to use this command. This example shows how to determine which nodes provide the supermon service: # shownode servers supermon n3 The command output indicates that only the node n3 supplies the supermon service.
dhcp: n3 lsf: n3 munge: n3 nagios: n3 nagios_monitor: n3 nsca: n3 ntp: n3 slurm: n3 supermond: n3 syslogng_forward: n3 The shownode services node server command displays no output if no server exists. For more information, see shownode(8). 4.3 Restarting a Service The method to use to restart a service depends whether or not improved availability is in effect for that service.
4.4 Stopping a Service The method to use to stop a service depends whether or not improved availability is in effect for that service.
• • • • • • • • • “Assigning Roles with the cluster_config Utility” (page 62) “The *config.d Directories” (page 62) “Configuration Scripts” (page 62) “Understanding Global Configuration Scripts” (page 67) “Advance Planning” (page 67) “Editing the roles_services.ini File” (page 68) “Creating a service.ini File” (page 68) “Adding a New Service” (page 71) “Verifying a New Service” (page 73) 4.6.
1. 2. 3. 4. 5. 6. 7. Service-specific attributes are made available to the cluster_config utility in service-specific *.ini files. As the superuser (root), you run the cluster_config utility on the head node to configure the HP XC system. You assign roles to nodes (and thus, services) through the cluster_config text-based user interface menu options.
• • “The nconfigure Scripts” (page 66) “The cconfigure Scripts” (page 66) Common Conventions When a node list is passed as an argument to the configuration scripts, the node list is passed in reverse numeric order, which means that the head node is first if it is present in the list.
The gconfigure scripts return 0 (zero) on success and return a nonzero value on failures. You can stop the configuration process on a nonrecoverable gconfigure error, which is indicated by the gconfigure script exiting with a return code of 255. Alternatively, you can use (config_die) in ConfigUtils.pm to return 255. Any exit code other than 255 will continue with the configuration process.
Example 4-1 Sample gconfig Script: Client Selection and Client-to-Server Assignment my($service) = 'my_new_service'; sub gconfigure { my(@servers) = @_; cs_reset_service($service,@servers); 1 my($client_flags) = ''; # "all selected nodes", default 2 my($assignment_flags) = ''; # "client may be assigned to itself", default # Make a server set from the server list my($serverset) = Set::Node->select_by_name(@servers); 4 # Get my client list # Client list is "all nodes" my($clientset) = cs_get_all_clients 5 ($s
• sa_locality Assigns clients to servers that have similar traffic patterns to limit the amount of network traffic. • sa_default_assignment Assigns the default assignment. The default assignment is the assignment policy HP decided is the most useful for most of the services. NOTE: If necessary, you can combine patterns using a pipe character (|).
The cconfigure scripts are executed inside the cluster_config utility as follows: script_path cconfigure client server1 server2 ... script_path cunconfigure client server1 server2 ... The client is the node name on which the script is running. The servers are the nodes providing this service to this client. Typically, this is a single server. The cconfigure action is performed only on nodes that are clients of the service.
4.6.8 Editing the roles_services.ini File Adding a new service to the roles model requires you to do one of the following: • • Add the service to an existing role or roles Add a new role or roles for the new service and add the new service into the new role or roles You can accomplish both these tasks by editing the /opt/hptc/config/roles_services.ini file and adding the new service name to an existing role or a new role.
and services for which no corresponding .ini file can be found are configured according to the default.ini file. Evaluate any new service regarding whether or not you need to create a service-specific service.ini file with the appropriate parameters. The default.ini file is available and may be suitable for new services. Review its contents to make sure it is compatible with your new service before allowing the cluster_config utility to use it as the default for your new service.
Do not assign 1 to both the head_node_required and head_node_desired parameters. If you assign 1 to the head_node_required parameter, assign 0 to the head_node_desired parameter. head_node_desired Assigning 1 to this parameter indicates that running the service on the head node is beneficial but not necessary. Assigning 0 indicates that the service does not benefit from running on the head node. Do not assign 1 to both the head_node_required and head_node_desired parameters.
Role HN HN Ext Ext Exc Recommend Assigned Rec Req Rec Req Rec Rec Role ---------------------------------------------------------------3 3 compute 1 1 disk_io 1 1 external (optional) 1 0 login (optional) 1 1 management_hub 1 1 management_server 1 0 nis_server (optional) 1 1 resource_management The column headings in the middle of the report correspond to parameters in the service.
4. Use the text editor of your choice to edit the roles_services.ini file as follows: a. Add the name of the new service to the stanza that lists the services. services = <
3. Edit the roles_services.ini file as follows: a. Add the name of the new role to the stanza that lists all the roles: roles = <
3. After cluster_config processing has completed, use the following command to verify the servers and server-to-client relationships to ensure they are configured as you as expected: # shownode config 4. 74 After the system is started, verify that the nconfig and cconfig scripts performed correctly, on the correct nodes, by examining the /var/log/nconfig.log file on nodes other than the head node.
5 Managing Licenses This chapter describes the following topics: • • • “License Manager and License File” (page 75) “Determining If the License Manager Is Running” (page 75) “Starting and Stopping the License Manager” (page 75) 5.1 License Manager and License File The license manager service runs on the head node and maintains licensing information for software on the HP XC system. You can find additional information on the FLEXlm license manager at the Macrovision Web site: http://www.macrovision.
5.3.1 Starting the License Manager Use the following command to start the license manager: # service hptc-lm start 5.3.2 Stopping the License Manager Use the following command to stop the license manager: # service hptc-lm stop 5.3.
6 Managing the Configuration and Management Database The configuration and management database, CMDB, is key to the configuration of the HP XC system. It keeps track of which nodes are enabled or disabled, the services that a node provides, the services that a node receives, and so on.
ipaddr: level: location: netmask: region: cp-n2: cp_type: host_name: hwaddr: ipaddr: level: location: netmask: region: 172.21.0.1 1 Level 1 Switch 172.20.65.2, Port 40 255.224.0.0 0 IPMI cp-n2 00:e0:8b:01:02:04 172.21.0.2 1 Level 1 Switch 172.20.65.2, Port 41 255.224.0.0 0 . . .
# shownode servers ntp n3 n3 This command output indicates that the node n3 supplies the ntp service for itself. 6.2.3 Displaying Blade Enclosure Information You can use the shownode command to provide information for the HP XC system with HP BladeSystems. With this command, you can: • • • • Display the names of the blade enclosures and the nodes (server blades) within them. List all the blade enclosures in the HP XC system and, for each, the nodes within them.
6.6 Archiving Sensor Data from the Configuration Database The managedb archive command enables you to remove sensor data records from the log tables. These log tables are any tables in the configuration and management database that end in .Log. Archiving has the advantage of decreasing the size of the log tables, which enables the shownode metrics command to run more quickly. You must be superuser to use this command.
You can specify a time parameter with the managedb purge command; sensor data older than the time parameter is purged. The time parameter is the same as for the managedb archive command. See “Archiving Sensor Data from the Configuration Database” (page 80) for a description of the time parameter. The following command purges sensor data older than two weeks: # managedb purge 2w For more information on the managedb command, see managedb(8). The archive.
7 Monitoring the System System monitoring can identify situations before they become problems. This chapter addresses the following topics: • • • • • • • • • “Monitoring Tools” (page 83) “Monitoring Strategy” (page 84) “Displaying System Environment Data” (page 85) “Monitoring Disks” (page 85) “Displaying System Statistics” (page 85) “Logging Node Events” (page 87) “The collectl Utility” (page 89) “HP Graph” (page 92) “The netdump and crash Utilities” (page 96) 7.
The syslog service runs on each node in the HP XC system. These daemons capture log information and send it to an aggregator regional node. Regional nodes are assigned to each client node. The syslogng_forward service on each regional node enables the node to act as a log aggregator for the global node. Log information is gathered, consolidated, and forwarded to the global node; the global node is not necessarily the head node.
database. The Root Supermon also connects to all other Supermon services and manages a subset of nodes. The syslog daemons report events to the syslog-ng services Other tools, such as collectl, work independently from this structure. 7.3 Displaying System Environment Data The HP XC System Software uses the Nagios monitoring application to gather and display environment data.
--------------------------------------------------------------------------date and time stamp |n14 |Sensor count |29 |Sensors within threshold; ok Individual sensors are displayed only when a sensor is out of range. Otherwise a sensor count is displayed. Invoking the command without specifying a node displays the sensor data for all the nodes in the HP XC system as the following example shows. The output is truncated horizontally to fit on the page.
# shownode metrics mem Timestamp |Node |Total |Free |Buffer |Shared |TotalHigh |TotalFree -------------------------------------------------------------------------------------date and time stamp |n14 |4037620 |3922916 |5864 |0 |0 |0 date and time stamp |n15 |4037608 |2835972 |6720 |0 |0 |0 date and time stamp |n16 |8005316 |107448 |183388 |0 |0 |0 7.5.
node name of the aggregator. The aggregator nodes forward their clients' events to the master aggregator node, which produces a consolidated log file, /hptc_cluster/adm/logs/consolidated.log. The assignment of regional and global nodes is made during the execution of the gconfig utility during installation. You can determine which nodes are the regional nodes with the shownode command: # shownode config syslogng_forward The shownode command identifies the nodes that supply the syslogng_forward service.
3. Examine both template files in the global and regional directories to determine which template file applies. In this example, you must edit both template files. 4. Make a backup copy of the template file or files that you will modify. # cp regional/syslog_ng_regional_template regional/template_backup # cp global/syslog_ng_global_template global/template_backup 5. Use the editor of your choice to modify the template files. IMPORTANT: 6. Keep a record of the changes you make in the template files.
7.7.1 Running the collectl Utility from the Command Line The default action of this utility is to collect data at 10-second intervals and to display the data in ASCII characters on the terminal screen. Example 7-1 shows the invocation and first record reported from the collectl utility. The information has been edited to fit horizontally on the page. Example 7-1 Using the collectl Utility from the Command Line # collectl waiting for 10 second sample... ### RECORD 1 >>> n3 <<< (m.
The collectl service is set up to collect normally reported summary data and to write it in a compressed text file in the /var/log/collectl directory. The actions of the collectl service are specified by the /opt/hptc/config/services/collectl.ini file.
7.8 HP Graph The RRDtool software tool is integrated into the HP XC system to create and displays graphs about the network bandwidth and other system utilization. You can access this display by selecting HP Graph in the Nagios menu. Figure 7-4 is an example of the default display. It provides an overview of the system with graphs for node allocation, CPU usage, memory, Ethernet traffic, and, if relevant, Interconnect traffic.
Figure 7-4 HP Graph System Display By selecting an item in the menu in the upper left-hand side, you can specify the graphical data on any Nagios host. Figure 7-5 shows the graphical data for one node on the system. 7.
NOTE: hosts. The detail graphs for a system display show the graphs for a specified metric on all the Nagios The detail graphs for a Nagios host display show all the applicable metrics for that Nagios host.
Figure 7-5 HP Graph Host Display 7.
The Metric menu influences the display of the detail graphs for a system display. This menu offers the following choices: bytes in bytes out cpu idle cpu iowait cpu system cpu usage This graph reports the rate of data received on all network devices on the node. This graph reports the rate of data transmitted on all network devices on the node. This graph indicates how much of the node's CPU set was available for other tasks.
NOTE: The crash utility is designed to examine an uncompressed kernel image (a vmlinux file) that was compiled with the compiler's -g option so that it can be debugged. Consider editing the kernel Makefile to add the -g option to the CFLAGS line. 7.9.1 Installing Netdump and Crash Two RPMs for Netdump are available: • • The netdump client runs on the nodes to be monitored. The netdump-server runs on the nodes that can receive the kernel dumps over the internal network.
cat /etc/sysconfig/netdump_id_dsa.pub | \ ssh netdump@$NETDUMPADDR cat '>>' /var/crash/.ssh/authorized_keys2 Be prepared to supply the superuser password on the netdump-server node. If you do not want to set a password for the netdump user, enter the following command: cat /etc/sysconfig/netdump_id_dsa.pub | \ ssh root@$NETDUMPADDR cat '>>' /var/crash/.ssh/authorized_keys2 7.9.2.
See crash(8) for more information. 7.9.6 Using the Crash Utility to Analyze a Live System You must log in as superuser (root) to analyze a system while it is running. Use the following command to start the crash utility: # crash vmlinux.dbg NOTE: In this example, the vmlinux.dbg file is the uncompressed kernel image (that is, a vmlinux file) that has been compiled with the -g option. The vmlinux.dbg file is the same version as the live system kernel. See crash(8) for more information. 7.
8 Monitoring the System with Nagios The Nagios open source application has been customized and configured to monitor the HP XC system and network health. This chapter introduces Nagios and discusses these modifications.
8.1.1 Nagios Components The components that comprise Nagios are as follows: Nagios • Nagios engine • Nagios Web interface • Nagiostats tool Standard Plug-Ins These plug-ins are not configured for any particular system. Although they are all provided, not all these plug-ins are used on HP XC systems.
8.1.5 Nagios Files The following lists the files and directories that are important to Nagios configuration for HP XC: /opt/hptc/nagios/bin Contains the Nagios binaries. /opt/hptc/nagios/libexec Contains plug-ins specific to Nagios and to the HP XC system. /opt/hptc/nagios/etc Contains the Nagios configuration values. NOTE: Files having file names of the form xc*.cfg and *_local.cfg are generated during and as a result of the nconfig process. Do not modify these files manually. Modify the file nagios.
You can choose any of the options on the left navigation bar. These options are shown in Figure 8-2. Figure 8-2 Nagios Menu (Truncated) After you chose an option from the window, you are initially prompted for a login and a password. This login and the password were established when the HP XC system is configured. Usually, the login name is nagiosadmin. The Nagios passwords are maintained in the /opt/hptc/nagios/etc/htpasswd.users file.
The term Hosts on the Nagios window refers to any entity with an IP address, not just nodes. For example, Nagios monitors the 1,024 nodes and four switches in an HP XC system, and reports on the status of 1,028 hosts. SFS is also an example of a Nagios host; Nagios finds the name of the SFS server and displays its status. Keep this in mind when using the Nagios application. The HP XC System Software provides plug-ins that monitor these and other system statistics. 8.2.
Figure 8-4 Nagios Service Detail View The Status column identifies problems. In this example, the Status column flags a problem with the head node's Slurm Monitor. Selecting a link for a Nagios service in the Service column opens the Nagios Service Information view for the corresponding Nagios service. For example, selecting the Slurm Monitor link in this example opens the following view, as shown in Figure 8-5.
Figure 8-5 Nagios Service Information View You can also use the Nagios report generator utility, nrg, to obtain an analysis of the Nagios service (plug-in). Select the analyze option to display a two-column listing of service status. The following is the command line entry for this feature: # nrg --mode analyze nh [Slurm Monitor - Critical] 'sinfo' reported problems with nodes in some partitions, specifically, some nodes may be marked with an '*' which indicates they may be unresponsive to SLURM.
Figure 8-6 Nagios Service Problems View Selecting the link corresponding to the Nagios Host opens the Nagios Host Information view for that Nagios host. Figure 8-7 is an example of the Nagios Host Information view displayed by selecting the link for xc10n4 in the Nagios Service Problems view shown in Figure 8-6.
Figure 8-7 Nagios Host Information View You can also use the Nagios report generator utility, nrg, to obtain an analysis of the Nagios service (plug-in). Select the analyze option to display a two-column listing of service status. The following is the command line entry for this feature: # nrg --mode analyze nh [Slurm Monitor - Critical] 'sinfo' reported problems with nodes in some partitions, specifically, some nodes may be marked with an '*' which indicates they may be unresponsive to SLURM.
• • • • “Changing Sensor Thresholds” (page 112) “Adjusting the Time Allotted for Metrics Collection” (page 112) “Changing the Default Nagios User Name” (page 113) “Disabling Individual Nagios Plug-Ins” (page 115) 8.3.1 Stopping and Restarting Nagios Nagios can record a multitude of alerts on large systems when many nodes undergo known maintenance operations. These operations can include restarting or shutting down the HP XC system.
NOTE: If you change the nagios_vars.ini file, you must propagate it to all nodes. For more information, see Chapter 10 (page 129) Figure 8-8 Nagios Configuration When you change the Nagios configuration, you must perform the following tasks: 1. 2. 3. 4. 5. 6. Read the Nagios documentation carefully. Change the template files accordingly. Stop the Nagios service. For instructions on how to stop the Nagios service, see “Stopping and Restarting Nagios” (page 110).
# 'nagios' contact definition define contact{ contact_name alias service_notification_period host_notification_period service_notification_options host_notification_options service_notification_commands host_notification_commands email pager } nagios Nagios Admin 24x7 24x7 w,u,c,r d,u,r notify-by-email,notify-by-epager host-notify-by-email,host-notify-by-epager nagios@localhost.localdomain nagios@localhost.
/opt/hptc/nagios/etc/templates/nagios_template.cfg or /opt/hptc/nagios/etc/templates/nagios_monitor.cfg template file.
Alternatively, you can use NIS to change the user account name if this appropriate for your site. NOTE: 4. This example retains the default user ID for Nagios. Change the line: nagios_user=nagios to nagios_user=newname in each of the following files: • • • • /opt/hptc/nagios/etc/nagios.cfg /opt/hptc/nagios/etc/nagios_monitor.cfg /opt/hptc/nagios/etc/nrpe.cfg /opt/hptc/nagios/etc/nsca.cfg NOTE: Complete steps 5 through 10 only for a new user name that was added after the cluster_config utility was run.
# pdsh –a "service nagios nconfigure" 11. Restart Nagios. For instructions on how to restart Nagios, see “Stopping and Restarting Nagios” (page 110). 8.3.7 Disabling Individual Nagios Plug-Ins All the Nagios plug-ins developed for the HP XC system are enabled by default. However, you can modify the /opt/hptc/nagios/etc/templates/*_template.cfg files to customize the service checks as needed. IMPORTANT: Do not modify files in the /opt/hptc/nagios/etc directory with file names of the form *_local.
8.4.1 Monitored Nagios Services Table 8-2 lists each Nagios service, also known as a plug-in, that Nagios monitors, by category and the function. The items in the Nagios Service column of this table correspond to the Service column of the Nagios Service Detail View and Service Problems View windows. Figure 8-4 (page 106) and Figure 8-6 (page 108) show an example of these windows, respectively.
Table 8-2 Monitored Nagios Services (continued) Category Nagios Service Function Individual Status Reported for Configuration All Nodes Switch and Appliance Status This plug-in reports the configuration information for a single node. Environment This plug-in provides per-node sensor reporting and alerts. It reports on an individual node's sensor status. Depending on the node type, all available “live” sensors are reported. Select the link in the Status Information column for detailed information.
Table 8-3 Default Settings for Monitored Nagios Services (continued) Actively Launched on Max.
• /opt/hptc/nagios/ibexec/sensorData.dat Contains patterns for alerting based on sensor results. Nagios uses e-mail to send formatted alerts.
The HP XC system event log functionality provides complete management of all log types of supported HP platforms. Log information is regularly read, archived, and used to generate Nagios alerts when applicable. Logs that approach a critical size are cleared to prevent loss of event data. Event logs are typically accessed through the management port. They require platform- and protocol-specific user authentication as well as network access to the console port (cp-nxxx, where nxxx is the node number).
The default values specified in the nand.conf and nanc.conf configuration files are appropriate for most HP XC systems; however, you can change these values to suit your installation. Follow these steps when updating either or both of these configuration files. 1. 2. 3. Log in as the superuser on the Nagios master node. Edit the nanc.conf or nand.conf file. Use the service command to stop the nagios daemon: # pdsh -a "service nagios stop" 4.
Example 8-1 The nrg Utility System State Analysis # nrg --mode analyze Nodelist ---------------------n[3-7] nh 122 Description --------------------------------------------------[Environment - NODATA] No sensor data is available for reporting. Use 'shownode metrics sensors -last 20m node xxxx' for each of these nodes to verify if sensor data has been recently collected. This status is drawn from the same source as the shownode metrics sensors command.
nh [Slurm Monitor - Critical] 'sinfo' reported problems with nodes in some partitions, specifically, some nodes may be marked with an '*' which indicates they may be unresponsive to SLURM. Run 'sinfo' for more information. n[3-7] [Slurm Status - Critical] sinfo reported problems with partitions for this node nh [Supermon Metrics Monitor - Critical] The metrics monitor has returned a critical status indicating a number of nodes have reported critical thresholds.
received by the Nagios engine. Service *may* be fine, but if it continues to pend for more then about 30 minutes it may indicate data is not being collected. This utility can generate: • A list of nodes according to their severity: — Critical — Warning — Ok — Unkown — Pending • • • • Nagios hosts status. Nagios services status. Nagios monitors status. A list of nodes that are up or down. For additional information about this utility, see nrg(8).
9 Network Administration This chapter addresses the following network topics: • • • • “Network Address Translation Administration” (page 125) “Network Time Protocol Service” (page 126) “Changing the External IP Address of a Head Node” (page 127) “Modifying Sendmail” (page 127) 9.1 Network Address Translation Administration Network Address Translation (NAT) enables compute nodes that do not contain external devices to have external network access.
When nodes are configured as NAT clients, the default gateways are established. By default, each NAT client has a single gateway. If a NAT server fails, however, the NAT client loses connectivity. You can configure a system for multiple gateways to lessen the possibility of loss of connectivity, but the system may have performance problems. External access from NAT clients using UDP has been shown to work well, however.
9.3 Changing the External IP Address of a Head Node Use the following procedure to change the external IP address of the head node: NOTE: This procedure requires you to reboot the head node. 1. Edit the /etc/sysconfig/netinfo file as follows: a. Specify the new head node external IP address in the --ip option of the network command. b. Ensure that the MAC address corresponds to the proper Network Interface Card (NIC). c.
10 Distributing Software Throughout the System This chapter addresses the following topics: • “Overview of the Image Replication and Distribution Environment” (page 129) • “Installing and Distributing Software Patches” (page 130) • “Adding Software or Modifying Files on the Golden Client” (page 131) • “Determining Which Nodes Will Be Imaged” (page 135) • “Golden Image Checksum” (page 135) • “Updating the Golden Image” (page 136) • “Propagating the Golden Image to All Nodes” (page 139) • “Maintaining a Globa
10.2 Installing and Distributing Software Patches The following is a generic procedure for installing software patches: 1. 2. Log in as superuser (root) on the head node. Use the rpm command to install the software package on the head node: # rpm -Uvh package.rpm The name of the software package usually contains a revision number and a designator for the hardware platform: either x86_64 for the CP3000 and CP4000 systems or ia64 for the CP6000 systems. 3.
10.3 Adding Software or Modifying Files on the Golden Client The first step in managing software changes to your HP XC system is to update the golden client node. This can involve adding new software packages, adding a new user, or modifying a configuration file that is replicated across the HP XC system, such as a NIS or NTP configuration file. Note: It is important to have a consistent procedure for managing software updates and changes to your HP XC system.
1. 2. 3. Make a copy of the appropriate master autoinstallation script Use the text editor of your choice to modify the OVERRIDES variable to match your overrides subdirectory name, /name. In the /var/lib/systemimager/scripts directory, create symbolic links to this master autoinstallation script for the nodes that will receive this override. The symbolic link names must follow the format name.sh, where name is the host name of each node to receive the override.
8. Link all the login nodes (that is, those nodes with the lvs service) to the compiler.master.0 autoinstallation script # for i in $(expandnodes $(shownode servers lvs)) do ln -sf compiler.master.0 $i.sh done 9. Verify that the links are correct # ls -l . . . lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx 1 root 1 root 1 root root root root ... n7.sh -> compiler.master.0 ... n8.sh -> compiler.master.0 ... n9.sh -> compiler.master.0 Now the system can be imaged.
All the nodes to be overriden must be linked. 7. Use the text editor of your choice to edit the /etc/systemimager/flamethrower.conf file. Add an entry at the end of the file for the directory you created in the first step. In this example, add the entry shown in bold: [base_image] DIR = /var/lib/systemimager/images/base_image [override_n8_override] DIR = /var/lib/systemimager/overrides/n8_override [override_base_image] DIR = /var/lib/systemimager/overrides/base_image Save the file and exit the text editor.
Global service configuration scripts are located in the /opt/hptc/etc/gconfig.d directory. • Node-Specific Configuration The node-specific service configuration step uses the results of the global service configuration step described previously to apply to a specific node its “personality” with respect to the service. User interaction is not permitted because this step runs on a per-node basis.
RPCNFSDCOUNT: 8 xc_version: Version number The table entry golden_image_md5sum identifies the MD5 checksum of the golden image file structure. The table entry golden_image_modification_time identifies the date and time the current golden image was created. The table entry golden_image_tar_valid is set to 1 when the compressed tar file of the golden image is created. It is set to 0 while the during the creation of the golden image tar file.
Note: Before updating the golden image, make a copy in case you need to revert back. Use the SystemImager si_cpimage command to perform this task. Ensure that you have enough disk space in the target directory where the image will be saved; image sizes are typically 3–6 GB and the size of a compressed tar file of an image is generally 1–3 GB. The following command makes a copy of the default golden image, base_image, in the /var/lib/systemimager/images directory.
3. 4. 5. At some time, you may need to reinstall or modify the existing configuration. Copy the saved configuration file from the external file system back on to an HP XC system file system, preferably to a disk local to the head node. Run the cluster_config command with its --infile option; specify the saved configuration file: # cluster_config --infile conf_file For more information about this command, see cluster_config(8) and the HP XC System Software Installation Guide. 10.6.
The image replication and distribution environment uses three separate exclusion files: • /opt/hptc/systemimager/etc/base_exclude_file Used during the initial creation of the golden image, which occurs as a result of executing the cluster_config command. The golden client has very little personality at this time, so this exclude file is fairly sparse. After the initial golden image is created from the golden client, the golden client is configured, and it takes on its node-specific personality.
# startsys --image_and_boot # transfer_to_avail 10.7.2 Using the cexec Command You can use the cexec command to copy designated files to client nodes. Its advantage is that it is performed immediately, and potentially with little effect on the HP XC system configuration files, presuming the file list does not contain any configuration files that require the restarting of services.
7. Execute all gconfig scripts and update the golden image: # /opt/hptc/config/sbin/cluster_config 8. Determine the name of the node that provides the imageserver service with the shownode command: # shownode servers imageserver The node name returned is used as the argument to the --server option in the next step. 9. Distribute the golden image update to all nodes by using the information in “Propagating the Golden Image to All Nodes” (page 139). 10.
11 Opening an IP Port in the Firewall This chapter addresses the following topics: • • “Open Ports” (page 143) “Opening Ports in the Firewall” (page 144) 11.1 Open Ports Each node in an HP XC system is set up with an IP firewall, for security purposes, to block communications on unused network ports. External system access is restricted to a small set of externally exposed ports. Table 11-1 lists the base ports that are always open by default; these ports are labeled “External”.
Table 11-2 Service Ports Service Internal or External Port Number Protocol Comments Flamethrower Internal 9000 to 9020 udp The highest port number used is based on the number of modules configured to udpcast. Usually, the upper limit is 9020. LSF External 6878 to 6879, 6881 to 6883 tcp/udp Only if the HP XC system is set up as a member of a larger LSF cluster.
NOTE: Use the openipport command judiciously. The port remains open unless or until the node is reimaged, even if the node is rebooted. Typically, you would use the openipport command for each defined interface except the external interface. The following example opens port 44 in the firewall for the udp protocol on the Admin, Interconnect, and loopback interfaces on the current node. The --verbose option displays error messages, if any.
# set up port 389 on Interconnect interface: -A RH-Firewall-1-INPUT -i Interconnect -p tcp -m tcp --dport 389 -j ACCEPT # setup port 389 on admin interface -A RH-Firewall-1-INPUT -i Admin -p tcp -m tcp --dport 389 -j ACCEPT Ensure that this portion of the /etc/sysconfig/iptables.
12 Connecting to a Remote Console This chapter addresses the following topics: • • “Console Management Facility” (page 147) “Accessing a Remote Console” (page 147) 12.1 Console Management Facility The Console Management Facility (CMF) daemon collects and stores console output for all other nodes on the system. This information is stored individually for every node and is backed up periodically. This information is stored under dated directories in the /hptc_cluster/adm/logs/cmf.dated/current directory.
n16 login: username passwd: password 6. Enter the escape character returned by the console command in Step 3 to terminate the connection. Note: Some nodes, depending on the machine type, accept a key sequence to enter and exit their command-line mode. See Table 12-1 to determine if these key sequences apply to your node machine type. Do not enter the key sequence to enter command-line mode. Doing so stops the Console Management Facility (CMF) from logging the console data for the node.
13 Managing Local User Accounts and Passwords This chapter describes how to add, modify, and delete a local user account on the HP XC system.
• User's password Note: A customary practice is to assign a temporary password that the user changes with the passwd command, but this data must be propagated to all the other system nodes also. See “Distributing Software Throughout the System” (page 129) for more information. • User identifier number (UID) A default value is assigned if you do not supply this information; this value is usually the next available UID. If you assign a value, first make sure that it is not already in use.
13.5 Deleting a Local User Account Remove a user account with the userdel command; you must be superuser on the golden client node to use this command. This command provides the -r option, which removes that user's home directory, all the files in that directory, and the user's mail spool. Make sure that you propagate these changes to all the other nodes in the system as described in “Distributing Software Throughout the System” (page 129). 13.
# rm -f /tmp/root_crontab 6. Verify the changes you made to the root user's crontab file: # crontab -l # DO NOT EDIT THIS FILE - edit the master and reinstall. # (/tmp/root_crontab installed on date time year) # (Cron version -- $Id: crontab.c,v 2.13 vixie Exp $) . . . 20 * * * * /usr/lib/yp/ypxfr_1perhour 40 6 * * * /usr/lib/yp/ypxfr_1perday 55 6,18 * * * /usr/lib/yp/ypxfr_2perday 13.
If you must change this password, be sure that you back up the CMDB as described in “Backing Up the Configuration Database” (page 79). You also need to update the CMDB manually, which is beyond the scope of this document. 13.8.
# telnet IRON00 IR0N00 login: admin Password: Welcome to Voltaire Switch IR0N00 connecting IR0N00> enable password: IR0N00# ? pass password update [admin, enable]password update [admin, enable] IR0N00# password update admin Insert new (up to 8 characters)password: Please retype new password: OK IR0N00# password update enable Insert new (up to 8 characters)password: Please retype new password: OK IR0N00# Also, enter the following command to set up ssh as root on the switch, which will enable you to access th
NOTE: 1. Substitute the node name (for example, n144) for the headnode variable. Use the text editor of your choice to add the following line to the /etc/hosts file: external-ip-address cp-headnode For information on the /etc/hosts file, see hosts(5). 2. Use the ipmitool to set the BMC password.
At this time, the lsfadmin account password is changed only on the head node. 3. Update the golden image to ensure that the lsfadmin password change is propagated the next time the nodes are reinstalled: # updateimage --gc `hostname` --no-netboot For additional information on updating the golden image, see “Updating the Golden Image” (page 136). 4.
14 Managing SLURM The HP XC system uses the Simple Linux Utility for Resource Management (SLURM).
SLURM offers a set of utilities that provide information about SLURM configuration, state, and jobs, most notably scontrol, squeue, and sinfo. See scontrol(1), squeue(1), and sinfo(1) for more information about these utilities. SLURM enables you to collect and analyze job accounting information. “Configuring Job Accounting” (page 167) describes how to configure job accounting information on the HP XC system. “SLURM Troubleshooting” (page 240) provides SLURM troubleshooting information. 14.
Table 14-1 SLURM Configuration Settings (continued) Setting Default Value* SwitchType switch/elan for systems with the Quadrics interconnect switch/none for systems with any other interconnect * Default values can be adjusted during installation. You can also use the scontrol show config command to examine the current SLURM configuration.
slurmctld daemon and (if there is more than one node with the resource management role) the node to run the backup slurmctld daemon. Be sure to shut down SLURM on the HP XC system before adjusting these settings manually. See the HP XC System Software Installation Guide for information about changing the choice of primary and backup nodes for SLURM by using the cluster_config utility. 14.2.3 Configuring Nodes in SLURM You can change the configuration of a set of nodes by editing the slurm.conf file.
If you make any changes, be sure to run the scontrol reconfigure command to update SLURM with these new settings. 14.2.4 Configuring SLURM Partitions Nodes are grouped together in partitions. A node can belong to only one partition. A SLURM job cannot be scheduled to run across partitions. Only the superuser (root) and the SLURM system administrator (SlurmUser) are allowed to allocate resources for any other user.
Table 14-2 SLURM Partition Characteristics (continued) Characteristic Description MinNodes Specifies the minimum number of nodes that can be allocated to any single job. The default is 1. Shared A text string that indicates whether node sharing for jobs is allowed: The node may be shared or not, depending on the allocation. YES FORCE The node is always available to be shared. The node is never available to be shared. NO State The state of the partition. The possible values are UP or DOWN.
Example 14-1 Using a SLURM Feature to Manage Multiple Node Types a. Use the text editor of your choice to edit the slurm.conf file to change the node configuration to the following: NodeName=exn[1-64] Procs=2 Feature=single,compute NodeName=exn[65-96] Procs=4 Feature=dual,compute NodeName=exn[97-98] Procs=4 Feature=service Save the file. b. Update SLURM with the new configuration: # scontrol reconfig c. Verify the configuration with the sinfo command. The output has been edited to fit on the page.
pending signals (-i) 1024 max locked memory (kbytes, -l) 128 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 8113 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Only soft resource limits can be manipulated. Soft and hard resource limits differ.
In the following example, all the system limits except RLIMIT_NPROC, which determines the maximum number of processes, and RLIMIT_NOFILE, which determines the maximum number of open files, are propagated from the user's environment. PropagateResourceLimitsExcept=RLIMIT_NPROC,RLIMIT_NOFILE If you make any changes, be sure to run the scontrol reconfigure command to update SLURM with these new settings.
job accounting log file. The default (and recommended) job accounting log file is /hptc_cluster/slurm/job/jobacct.log. SLURM job accounting attempts to gather all the statistics available on the systems on which it is run.
The bacct command reports a slightly increased value for a job's runtime when compared to the value reported by the sacct command. LSF-HPC with SLURM sums the resource usage values reported by itself and SLURM. 14.4.2 Disabling Job Accounting Job accounting is turned on by default. Note that job accounting is required if you are using LSF. Follow this procedure to turn off job accounting: 1. 2. Log in as the superuser on the SLURM server (see “Configuring SLURM Servers” (page 159)).
You can choose to isolate this data log on one node or in the /htpc_cluster directory so that all nodes can access it. However, this log file must be accessible to the following: • • • Nodes that run the slurmctld daemon LSF Any node from which you execute the sacct command Note: Ensure that the log file is located on a file system with adequate storage to avoid file system full conditions. This example uses the file /hptc_cluster/slurm/job/jobacct.log, which is the default and recommended file.
StaggerSlotSize Generally, the increment of time a process pauses before sending its message. For n tasks, an equal number of staggered time slots are defined in increments of (StaggerSlotSize * 0.001) seconds. The first task sends its message immediately; the second task pauses one increment before sending its message; the third task pauses two increments before sending its message; and so on. The default value of this parameter is 1.
lsf lsf swaptest up up up infinite infinite infinite 122 1 4 idle n[5-16,18-127] down n17 idle n[1-4] In this example, node n17 is down. The squeue utility reports the state of jobs currently running under the SLURM's control. For more information about the squeue utility, see squeue(1). The SLURM log files on each node in /var/slurm/log are helpful for diagnosing specific problems. The log files slurmctld.log and slurmd.log log entries from their respective daemons.
Table 14-4 Output of the sinfo command for Various Transitions Transition Cause: sinfo shows: Meaning: Transient Network Congestion alloc The node is running a job alloc* The slurmctld daemon has lost contact with the node alloc Contact between the node and the slurmctld daemon has been restored Node fails while no job is running on the idle node. idle* Node fails while a job is running on the node The System Administrator sets the node state to down.
EPILOG_EXCLUDE_NODES="n101 n102 n103 n104 n105" The SLURM epilog is located at /opt/hptc/slurm/etc/slurm.epilog.clean initially. You can maintain the file in this directory, move it to another directory, or move it to a shared directory. If you decide to maintain this file in a local directory on each node, be sure to propagate the SLURM epilog file to all the nodes in the HP XC system. The following example moves the SLURM epilog file to a shared directory: # mv /opt/hptc/slurm/etc/slurm.epilog.
14.9 Enabling SLURM to Recognize a New Node Use the following procedure to enable SLURM to recognize a new node, that is, a node known to the HP XC system but not managed by SLURM. This procedure adds node n9 to the SLURM lsf partition, which already consists of nodes n1 through n8. 1. 2. Log in to the head node as the superuser (root). Log in to the node to be added to gather information on the node's characteristics: a.
NodeName=n[1-5] Procs=2 RealMemory=1994 NodeName=n[6-8] Procs=2 RealMemory=4032 These lines change to: NodeName=n[1-5] Procs=2 RealMemory=1994 NodeName=n[6-8] Procs=2 RealMemory=4032 NodeName=n9 Procs=2 RealMemory=2008 NOTE: If node the value for the RealMemory characteristic of n9 were 4032 in the example, the portion of the file would be changed to the following: NodeName=n[1-5] Procs=2 RealMemory=1994 NodeName=n[6-9] Procs=2 RealMemory=4032 The order of NodeNames arguments listed in this file is importa
1. 2. Log in as superuser (root) on the head node. Shut down SLURM: # scontrol shutdown 3. Run the configuration scripts to remove SLURM from the head node: # /opt/hptc/slurm/etc/gconfig.d/slurm_gconfig.pl gunconfigure # /opt/hptc/slurm/etc/nconfig.d/slurm_nconfig.pl nunconfigure 4. 5. Update the golden image. See Chapter 10 (page 129) for more information. Propagate the new golden image to all nodes. See Chapter 10 (page 129) for more information. 14.
15 Managing LSF The Load Sharing Facility (LSF) from Platform Computing Corporation is a batch system resource manager used on the HP XC system. LSF is an integral part of the HP XC environment. On an HP XC system, a job is submitted to LSF, which places the job in a queue and allows it to run when the necessary resources become available. In addition to launching jobs, LSF provides extensive job management and information capabilities.
The log directory is moved to /var/lsf so that per-node LSF daemon logging is stored locally and that it is unaffected by updateimage operations. However the logs will be lost during a reimage operation. The LSF directory containing the binary files remains in /opt/hptc/lsf/top; it will be imaged to all the other nodes. Also during the operation of the cluster_config utility, the HP XC nodes without the compute role are configured to remain closed with 0 job slots available for use.
(nodes) in the system. The LSF-HPC with SLURM daemons consolidate this information into one entity, such that these daemons present the HP XC system as one virtual LSF host. Note: LSF-HPC with SLURM operates only with the nodes in the SLURM lsf partition. As mentioned in the previous paragraph, LSF-HPC with SLURM groups these nodes into one virtual LSF host, presenting the HP XC system as a single, large SMP host.
The bqueues -l command displays the full queue configuration, including whether or not a job starter script has been configured. See the Platform LSF documentation or bqueues(1) for more information on the use of this command. For example, consider an LSF-HPC with SLURM configuration in which node n20 is the LSF execution host and nodes n[1-10] are in the SLURM lsf partition. The default normal queue contains the job starter script, but the unscripted queue does not have the job starter script configured.
15.2.1.2 SLURM External Scheduler The integration of LSF-HPC with SLURM includes the addition of a SLURM-based external scheduler. Users can submit SLURM parameters in the context of their jobs. This enables users to make specific topology-based allocation requests. See the HP XC System Software User's Guide for more information. 15.2.1.3 SLURM lsf Partition An lsf partition is created in SLURM; this partition contains all the nodes that LSF-HPC with SLURM manages.
The Nagios infrastructure contains a module that monitors the LSF-HPC with SLURM virtual IP. If it detects a problem with the virtual IP (for example, the inability to ping it), the monitoring code assumes the node is down and chooses a new LSF execution host from the backup candidate nodes on which to set up the virtual IP and restart LSF-HPC with SLURM. See “LSF-HPC with SLURM Failover” (page 189) for more information. 15.
work log The work directory is moved to /hptc_cluster/lsf/work; it is linked through a soft link to /opt/hptc/lsf/top/work. The log directory is moved to /var/lsf/log; it is linked through a soft link to /opt/hptc/lsf/top/log. This ensures that all LSF-HPC with SLURM logging remains local to the node currently running LSF-HPC with SLURM. 6.2 • • This directory remains in place and is imaged to each node of the HP XC system.
# controllsf start This command searches through a list of nodes with the lsf service until it finds a node to run LSF-HPC with SLURM. Alternatively, you can invoke the following command to start LSF-HPC with SLURM on the current node: # controllsf start here 15.5.2 Shutting Down LSF-HPC with SLURM At system shutdown, the /etc/init.d/lsf script ensures an orderly shutdown of LSF-HPC with SLURM.
Example 15-4 is a similar example, but 20 processors are reserved. Example 15-4 Launching Another Job Without the JOB_STARTER Script Configured $ bsub -I -n20 hostname Job <21> is submitted to default queue . <> <> n120 In both of the previous examples, processors were reserved but not used.
mmand date time: Submitted from host , CWD <$HOME>, Ou tput File <./>, 8 Processors Requested; date time: Started on 8 Hosts/Processors <8*lsfhost.
. Resource usage summary: CPU time : Max Memory : Max Swap : 8252.65 sec. 4 MB 113 MB . . .
The lshosts command reports the following resource information: • ncpus The total number of available processors within the SLURM lsf partition. This value is calculated as the minimum value between the number of processors in all available nodes in the lsf partition and the number of licensed cores. If total number of usable cores is 0, LIM sets the value of ncpus to 1 and closes the host. • maxmem The minimum value of configured SLURM memory for all nodes.
When LSF-HPC with SLURM is down, the response of the check_lsf plug-in depends on whether LSF-HPC with SLURM failover is enabled or disabled: • When LSF-HPC with SLURM failover is disabled The check_lsf plug-in returns an immediate failure notification to Nagios. • When LSF-HPC with SLURM failover is enabled The check_lsf plug-in decides if LSF-HPC with SLURM is supposed to be running.
LSF-HPC with SLURM monitoring and failover are implemented on the HP XC system as tools that prepare the environment for the LSF execution host daemons on a given node, start the daemons, then watch the node to ensure that it remains active. After a standard installation, the HP XC system is initially configured so that: • • • LSF-HPC with SLURM is started on the head node. LSF-HPC with SLURM failover is disabled.
You can use the controllsf command to change these assignments. • controllsf disable headnode preferred Specifies that the head node should be ordered at the end of the list, rather than at the head. • controllsf disable slurm affinity Specifies that HP XC should attempt to place the SLURM and LSF-HPC with SLURM daemons on separate nodes. • controllsf set primary nodename Specifies that LSF-HPC with SLURM start on some node other than the head node by default.
Table 15-3 Environment Variables for LSF-HPC with SLURM Enhancement (lsf.conf File) Environment Variable Description LSB_RLA_PORT=port_number This entry specifies the TCP port used for communication between the LSF-HPC with SLURM allocation adapter (RLA) and the SLURM scheduler plug-in. The default port number is 6883. LSB_RLA_TIMEOUT=seconds This entry defines the communications timeout between RLA and its clients (for example, sbatchd and the SLURM scheduler plug-in.) The default value is 10 seconds.
Table 15-3 Environment Variables for LSF-HPC with SLURM Enhancement (lsf.conf File) (continued) Environment Variable Description LSF_HPC_EXTENSIONS="ext_name,..." This setting enables Platform LSF extensions. This setting is undefined by default. The following extension names are supported: • SHORT_EVENTFILE This compresses long host name lists when event records are written to the lsb.events and lsb.acct files for large parallel jobs.
Table 15-3 Environment Variables for LSF-HPC with SLURM Enhancement (lsf.conf File) (continued) Environment Variable Description LSF_NON_PRIVILEGED_PORTS=Y|y Some LSF-HPC with SLURM communication can occur through privileged ports. This setting disables privileged ports usage ensuring that no communication occurs through privileged ports. Disabling privileged ports helps to ensure system security.
Table 15-4 Environment Variables for LSF-HPC with SLURM Enhancement (lsb.queues File) Environment Variable Description DEFAULT_EXTSCHED= SLURM[options[;options]...] This entry specifies SLURM allocation options for the queue. The -ext options to the bsub command are merged with DEFAULT_EXTSCHED options, and -ext options override any conflicting queue-level options set by DEFAULT_EXTSCHED.
Table 15-4 Environment Variables for LSF-HPC with SLURM Enhancement (lsb.queues File) (continued) Environment Variable Description This specifies mandatory SLURM allocation options for the queue. The -ext options of the bsub command are merged with MANDATORY_EXTSCHED options, and MANDATORY_EXTSCHED options override any conflicting job-level options set by the -ext option. This setting overrides the -ext options of the bsub command.
15.15 Configuring an External Virtual Host Name for LSF-HPC with SLURM on HP XC Systems An external virtual host name for LSF-HPC with SLURM on an HP XC system needs to be accessed from the external network. This access could be required if the HP XC system is added to an existing LSF cluster, or if the HP XC system is 'Multi-Clustered' with another LSF cluster. See the LSF documentation for more details on LSF Multi-Clusters. Perform the following steps to configure an external virtual host name: 1.
16 Managing Modulefiles This chapter describes how to load, unload, and examine modulefiles. Modulefiles provide a mechanism for accessing software commands and tools, particularly for third-party software. The HP XC System Software does not use modules for system-level manipulation. A modulefile contains the information that alters or sets shell environment variables, such as PATH and MANPATH. Some modulefiles are provided with the HP XC System Software and are available for you to load.
17 Mounting File Systems This chapter provides information and procedures for performing tasks to mount file systems that are internal and external to the HP XC system. It addresses the following topics: • “Overview of the Network File System on the HP XC System” (page 201) • “Understanding the Global fstab File” (page 201) • “Mounting Internal File Systems Throughout the HP XC System” (page 203) • “Mounting Remote File Systems” (page 207) 17.
Example 17-1 Unedited fstab.proto File # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead. # # How this file is organized: # # * Comments begin with # and continue to the end of line # # * Each non-comment line is a line that may be copied # to /etc/fstab verbatim.
#% n[60-63] The file systems can be either of the following: • External to the node, but internal to the HP XC system. “Mounting Internal File Systems Throughout the HP XC System” (page 203) describes this situation. The use of csys is strongly recommended. For more information, see csys(5). • External to the HP XC system. “Mounting Remote File Systems” (page 207) describes this situation. NFS mounting is recommended for remote file system mounting. 17.
Figure 17-1 Mounting an Internal File System HP XC Cluster . . . n59 n60 /scratch /dev/sdb1 n61 /scratch n62 /scratch n63 /scratch n64 . . . 17.3.1 Understanding the csys Utility in the Mounting Instructions The csys utility provides a facility for managing file systems on a systemwide basis. It works in conjunction with the mount and umount commands by providing a pseudo file system type. The csys utility is documented in csys(5).
options Specifies a comma-separated list of options, as defined in csys(5). The hostaddress and _netdev options are mandatory. • The hostaddress argument specifies the node that serves the file system to other nodes and specifies the network (administration or system interconnect) to be used. The hostaddress is specified either by its node name or by its IP address.
3. Determine whether you want to mount this file system over the administration network or over the system interconnect. As a general rule, specify the administration network for administrative data and the system interconnect for application data. 4. Edit the /hptc_cluster/etc/fstab.proto file as follows: a. Locate the node designator that specifies the node or nodes that will mount the file system.
Example 17-2 The fstab.proto File Edited for Internal File System Mounting # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead.
Figure 17-2 Mounting a Remote File System HP XC Cluster . . . n21 n22 External Server n23 xeno /extra n24 n25 . . . 17.4.1 Understanding the Mounting Instructions The syntax of the fstab entry for remote mounting using NFS is as follows: exphost:expfs mountpoint fstype options Specifies the external server that is exporting the file system. exphost The exporting host can be expressed as an IP address or as a fully qualified domain name.
1. Determine which file system to export. In this example, the file system /extra is exported by the external server xeno. 2. Ensure that this file system can be NFS exported. Note: This information is system dependent and is not covered in this document. Consult the documentation for the external server. 3. 4. Log in as superuser on the head node. Ensure that the mount point directory exists on all the nodes that will mount the remote file system.
Example 17-3 The fstab.proto File Edited for Remote File System Mounting # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead.
18 Managing Software RAID Arrays The HP XC system can mirror data on a RAID array. This chapter addresses the following topics: • • • • • • “Overview of Software RAID” (page 211) “Installing Software RAID on the Head Node” (page 211) “Installing Software RAID on Client Nodes” (page 211) “Examining a Software RAID Array” (page 212) “Error Reporting” (page 213) “Removing Software RAID from Client Nodes” (page 213) 18.
3. Edit the /etc/systemimager/systemimager.conf file as follows to specify a list of nodes that will use software RAID: • Add or change the following entry in this file to specify software RAID-0 for the list of nodes: SOFTWARE_RAID0_NODES = list of nodes For example, use SOFTWARE_RAID0 = n[1–5,9] for nodes n1, n2, n3, n4, n5, and n9.
Number 0 1 Major Minor RaidDevice State 8 1 0 active sync /dev/sda1 8 17 1 active sync /dev/sdb1 UUID : eead90a0:35c0bf46:9160b26b:2d754a4d Events : 0.10 Nagios uses the mdadm command to verify the status of the RAID array. 18.5 Error Reporting Errors can be reported during the installation of software RAID on a client node. If the installation software does not find two or more disks, the client node is imaged without software RAID, and a warning message is sent to the /hptc_cluster/admin/logs/imaging.
19 Using Diagnostic Tools This chapter discusses the diagnostic tools that the HP XC system provides. It addresses the following topics: • • • • “Using the sys_check Utility” (page 215) “Using the ovp Utility for System Verification” (page 215) “Using the dgemm Utility to Analyze Performance” (page 221) “Using the System Interconnect Diagnostic Tools” (page 222) Troubleshooting procedures are described in Chapter 20: Troubleshooting (page 229). 19.
• • All application nodes are responding and available to run applications. The nodes in the HP XC system are performing optimally. These nodes are tested for the following: — CPU core usage — CPU core performance — Memory usage — Memory performance — Network performance under stress — Bidirectional network performance between pairs of nodes — Unidirectional network performance between pairs of nodes For a complete list of verification tests, see ovp(8).
file_integrity server_status Test list for SLURM: spconfig daemon_responds partition_state node_state Test list for LSF: identification hosts_static_resource_info hosts_status Test list for interconnect: myrinet/monitoring_line_card_setup Test list for nagios: configuration Test list for xring: xring (X) Test list for perf_health: cpu_usage memory_usage cpu memory network_stress network_bidirectional network_unidirectional Test list for myrinet_status: myrinet_status An 'X' indicates support for extended v
To verify HP XC licensing, enter a command in the following format: # ovp -v=license By default, if any part of the verification fails, the ovp command ignores the test failure and continues with the next test. You can use the --failure_action option to control how the ovp command treats test failures. When you run the ovp command as superuser (root), it stores a record of the verification in a log file in the /hptc_cluster/adm/logs/ovp directory.
network_unidirectional Tests network performance between pairs of nodes using the HP MPI ping_pong_ring test. By default, the ovp command reports whether the nodes passed or failed the given test. Use the ovp --verbose option to display additional information. The results of the test are written to a file in the home directory of the user who ran the test or the user designated with the --user=user option. The file name has the form ovp_node_date[rx].
Details of this verification have been recorded in: /hptc_cluster/lsf/home/ovp_n16_mmddyy.log The following example tests the memory of nodes n11, n12, n13, n14, and n15. The --keep option preserves the test data in a temporary directory. # ovp --verbose --opts=--nodelist=n[11-15] --keep \ -verify=perf_health/memory XC CLUSTER VERIFICATION PROCEDURE date time Verify perf_health: Testing memory ...
19.3 Using the dgemm Utility to Analyze Performance You can use the dgemm utility, in conjunction with other diagnostic utilities, to help detect nodes that may not be performing at their peak performance. When a processor is not performing at its peak efficiency, the dgemm utility displays a WARNING message. Be sure to run this command from a systemwide directory, such as /hptc_cluster, because the dgemm utility's output files are written by the nodes.
$ bsub -nmax -o ./ mpirun -prot -TCP -srun -v -n max \ /opt/hptc/contrib/bin/dgemm.x The max parameter is the maximum number of processors available to you in the lsf partition. No warning messages appear when all the specified nodes are performing at their peak efficiency. 19.4 Using the System Interconnect Diagnostic Tools Various tools enable you to diagnose the system interconnect.
The output from the gm_prodmode_mon is logged to /var/log/diag/myrinet/gm_prodmode_mon/links.log by default, but you can specify another directory with the -d option. Output is displayed to the stdout to show the progress of the diagnostic test. This command is configured to run once each hour by a crontab file in the /etc/cron.hourly directory. 19.4.1.2 The gm_drain_test Diagnostic Tool This diagnostic tool runs five tests for the Myricom® switches in an HP XC system.
# qsdiagadm date time date time date time date time date time qsdiagadm: diags database created (QMS64,rails=1) qsctrl: passed power control check (on) qsctrl: passed population check (ok) qsctrl: passed bus control check (ok) qsctrl: passed gateway check (bootp,0.0.0.0) . . . date date date date date date date . . . time time time time time time time qsctrl: qsctrl: qsctrl: qsctrl: qsctrl: qsctrl: qsctrl: passed passed passed passed passed passed passed bus control check (ok) gateway check (bootp,0.
The qselantestp tool also parses each node's output file and reports any errors it finds. 19.4.2.3 The qsnet2_level_test Diagnostic Tool The qsnet2_level_test utility is a useful tool for diagnosing the Quadrics system interconnect. This utility uses the pdsh command to execute itself (that is, the qsnet2_level_test utility) on all nodes simultaneously. Each node has a specific path through the network that the node is responsible for testing.
Level3 of the qsnet2_dmatest tests the links between the QM502 card and the QM501 rear switch cards through the midplane. If the QM502 cards are not installed, the test spawns across the nodes, then waits for the timeout period and reports all the nodes as "failed to complete". Example 1 The following example tests level1. All the nodes write their log files to the directory named level1, which is a subdirectory of the global directory /hptc_cluster/adm/logs/diag/quadrics.
19.4.2.4 The qsnet2_drain_test Diagnostic Tool This tool runs up to six tests for the Quadrics switches in an HP XC system: • • • • • • Runs the qsctrl utility to verify that the system interconnects are running within the proper environmental parameters for operation. Runs qsnet2_level_test at level 1. Runs qsnet2_level_test at level 2. Runs qsnet2_level_test at level 3. Runs qsportmap on federated systems to test the link cable connectivity. Runs qsnet2_level_test at level 4 on federated systems.
The format of this command is: ib_prodmode_mon [--help] [--verbose] [-d directory-name] The output from the ib_prodmode_mon is logged to /var/log/diag/ib/ib_prodmode_mon/ib_prodmode_mon.log by default, but you can specify another directory with the -d option. Output is displayed to the stdout to show the progress of the diagnostic test. This command is configured to run once each day by a crontab file in the /etc/cron.daily directory. 19.4.
20 Troubleshooting This chapter provides information to help you troubleshoot problems with HP XC systems. It addresses the following topics: • • • • • “General Troubleshooting” (page 229) “Nagios Troubleshooting” (page 229) “System Interconnect Troubleshooting” (page 235) “SLURM Troubleshooting” (page 240) “LSF-HPC Troubleshooting” (page 241) See also Chapter 19 (page 215) for information on available diagnostic tools that you can use to locate the source of the failure. 20.
20.2.2 Nagios Fails to Start If Nagios fails to start, one or more Nagios daemons failed to start on a Nagios master or Nagios monitor node. Use the following procedure to start the Nagios daemons manually to overcome this: 1. Determine if the node that fails to run Nagios is a Nagios master or a Nagios monitor.
Example 20-1 Running a Nagios Plug-In from the Command Line 1. 2. Log in as the nagios user. Proceed to the /opt/hptc/nagios/libexec directory. $ cd /opt/hptc/nagios/libexec 3. Use the ls command to locate the Nagios plug-in you want to execute, for example: $ ls *_sel 4. Optionally, invoke the Nagios plug-in with the --help option: $ .
it continues to pend for more then about 30 minutes it may indicate data is not being collected. n[8-15] nh [Load Average - ASSUMEDOK] Pending services are normal, they indicate data has not yet been received by the Nagios engine. Service *may* be fine, but if it continues to pend for more then about 30 minutes it may indicate data is not being collected. n15 [NodeInfo - ASSUMEDOK] Pending services are normal, they indicate data has not yet been received by the Nagios engine.
Service: Configuration Monitor Status Information: Node information This message reports the total number of nodes, the number of nodes enabled, the number of nodes disabled, and the number of nodes imaged. No action is required. Service: Environment Status Information: Node sensor status A warning or critical message indicates that one or more monitored sensors reported that a threshold has been exceeded. Correct the condition.
Service: Root key synchronization Status Information: Root SSH key synchronization status This entry provides the status of the root key synchronization. A warning or critical message indicates that the root ssh keys for one or more hosts are out of synchronization with the head node. The ssh and pdsh commands may not work for these nodes. Verify that the imaging is correct on the affected nodes.
Service: System Free Space Status Information: Node / and /var free space This entry typically displays the status of the /, /var, and /hptc_cluster file systems on the node. A warning or critical message indicates that the thresholds for the specific node were exceeded. Clean up disk space. 20.
6. Run the lsmod command to display loaded modules. You should have one Myrinet GM loadable module installed. # lsmod | grep -i gm gm 589048 3 The size may differ from this output. 7. The Myrinet myri0 interface should be up. Use the ifconfig command to display the interface network configuration: # ifconfig myri0 myri0 Link encap:Ethernet HWaddr 00:60:DD:49:2D:DA inet addr:172.22.0.4 Bcast:172.22.0.255 Mask:255.255.255.
qsqsql-1.0.12-2.2hpt . . . The version numbers for your HP XC system may differ from these. 5. Run the lsmod command to display loaded modules. You should have eight Quadrics loadable modules installed. # lsmod jtag eip ep rms elan4 elan3 elan qsnet 30016 86856 821112 48800 466352 606676 80616 101040 0 1 9 0 1 0 0 0 (unused) [eip] [ep] [ep] [ep elan4 elan3] [eip ep rms elan4 elan3 elan] The sizes may differ from this output. 6.
20.4.3 InfiniBand System Interconnect Troubleshooting The following troubleshooting information applies to the InfiniBand system interconnects. Perform these steps on any node on which you suspect a problem to determine if your HP XC system is configured properly. If these tests pass but you are still experiencing difficulty, see Chapter 19: Using Diagnostic Tools (page 215). 1.
# uname -a Linux n3 2.4.21-15.7hp.XCsmp #1 SMP date ... GNU/Linux Make sure that your system has InfiniBand boards installed: # lspci -v | grep Infini 04:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1) Subsystem: Mellanox Technology MT23108 InfiniHost 5. Make sure that the InfiniBand RPM is installed: # rpm -q -a | grep ibhost ibhost-biz-3.0.16.5_2-1hptc.k2.4.21_27.2hp.XCsmp The version number may differ. 6. Make sure that the InfiniBand loadable modules are installed: # lsmod . . .
RX packets:7 TX packets:4 collisions:0 RX bytes:420 errors:0 dropped:0 overruns:0 frame:0 errors:0 dropped:0 overruns:0 carrier:0 txqueuelen:1000 (420.0 b) TX bytes:240 (240.0 b) You can try to ping other nodes that are connected to the network. 8. You can find additional information about InfiniBand in the /proc/voltaire directory. Use the find command to display it: # find /proc/voltaire -type f -print -exec cat {} \; 20.
Healthy node is down The most common reason for SLURM to list an apparently healthy node down is that a specified resource has dropped below the level defined for the node in the /hptc_cluster/slurm/etc/slurm.conf file. For example, if the temporary disk space specification is TmpDisk=4096, but the available temporary disk space falls below 4 GB on the system, SLURM marks it as down.
• When LSF-HPC with SLURM failover is disabled and the LSF execution host (which is not the head node) goes down, issue the controllsf command to restart LSF-HPC with SLURM on the HP XC system: # controllsf start • When failover is enabled, you need to intervene only when the primary LSF execution host is not started on HP XC system startup (when the startsys command is run). Use the controllsf command to restart LSF-HPC with SLURM.
21 Servicing the HP XC System This section describes procedures for servicing the HP XC system. For more information, see the service guide for your cluster platform. This chapter addresses the following topics: • • • • “Adding a Node” (page 243) “Replacing a Client Node” (page 244) “Replacing a System Interconnect Board in an CP6000 System” (page 246) “Software RAID Disk Replacement” (page 246) 21.1 Adding a Node The following procedure describes how to add one or more nodes in the HP XC system: 1. 2.
The cluster configuration is ready to be applied. Do it? [y/n] y b. When prompted, enter y to regenerate ssh keys: Root ssh keys for the cluster already exist (Warning: you will not be able to ssh/pdsh to other nodes until you reimage them) Would you like to regenerate them? ([n]/y) y c. d. When prompted, specify the same NTP server that you are using for SFS; the SFS and HP XC clocks must be synchronized. When prompted, supply the network type for your system: Enter the network type of your system.
The replacement node must have the identical (exact) hardware configuration to the node being replaced; the following characteristics must be identical: • • • Number of processors Memory size Number of ports 1. Prepare the replacement node hardware as described in the HP XC System Software Hardware Preparation Guide. Note the following requirements for each replacement node: • The node must be set to boot from the network. • The console port (cp) is set to request IP addresses through DHCP.
# scontrol update NodeName=n3 State=resume If the node is a member of an availability set, you need to relocate its services back to the node using the appropriate availability tool command. 21.3 Replacing a System Interconnect Board in an CP6000 System Use the following procedure to replace a Myrinet system interconnect board, InfiniBand system interconnect board, or a Quadrics system interconnect board in an CP6000 system. The example commands in the procedure use node n3.
Two SATA disk drives, /dev/sda and /dev/sdb are shown in this example. Follow these steps to replace the RAID disk: 1. Examine the array.
4. Use the mdadm command to remove each partition from the RAID array: # mdadm /dev/md1 -r /dev/sdb1 # mdadm /dev/md2 -r /dev/sdb2 # mdadm /dev/md3 -r /dev/sdb3 5. 6. 7. 8. Physically remove the old disk according to the manufacturer's instructions. Physically install the new disk according to the manufacturer's instructions. Partition the new disk.
NOTE: Be sure to use the appropriate disk drive device name for the disk1 and disk2 parameters. For example, for IDE disks, the values could be /dev/hda and /dev/hdb, respectively. 2. Adjust the boot order: # /tmp/post-install/50all.bootorder.pl NOTE: If this file was deleted from the client, you can find it on the head node in /var/lib/systemimager/post-install/50all.bootorder.pl file. For CP4000 Systems You need to update the boot order using the GRand Unified Bootloader, also known as grub.
A Installing LSF-HPC with SLURM into an Existing Standard LSF Cluster This appendix describes how to join an HP XC system running LSF-HPC with SLURM (integrated with the SLURM resource manager) to an existing standard LSF Cluster without destroying existing LSF-HPC with SLURM configuration. After installation, the HP XC system is treated as one host in the overall LSF cluster, that is, it becomes a cluster within the LSF cluster.
• • You should also be familiar with the normal procedures in adding a node to an existing LSF cluster, such as: — Establishing default communications (rhosts or ssh keys) — Setting up shared directories — Adding common users You should also have read Chapter 15: Managing LSF (page 177) in this document. A.2 Requirement LSF-HPC for HP XC can only be added to an existing standard LSF cluster running the most up-to-date version of LSF V6.2 or later.
3. Consider removing this installation to avoid confusion: # /opt/hptc/etc/gconfig.d/C55lsf gunconfigure removing /opt/hptc/lsf/top/conf... removing /opt/hptc/lsf/top/6.2... removing /opt/hptc/lsf/top/work... removing /opt/hptc/lsf/top/log... removing /hptc_cluster/lsf/conf... removing /hptc_cluster/lsf/work... removing /var/lsf... In this step, you remove the LSF installation from the current LSF_TOP directory, /opt/hptc/lsf/top. 4. 5. Log out then log back in to clear the LSF environment settings.
a. Lower the firewall on the HP XC external network. LSF daemons communicate through pre-configured ports in the lsf.conf configuration file, but the LSF commands open random ports for receiving information when they communicate with the LSF daemons. Because an LSF cluster needs this "open" network environment, trying to maintain a firewall becomes challenging. Security-aware customers are welcome to try to get LSF running with firewalls, but those procedures are beyond the scope of this documentation.
The goal of these custom files is to source (only once) the appropriate LSF environment file: $LSF_ENVDIR/cshrc.lsf for csh users, and $LSF_ENVDIR/profile.lsf for users of sh, bash, and other shells based on sh. Create /etc/profile.d/lsf.sh and /etc/profile.d/lsf.csh on the HP XC system to set up the LSF environment on HP XC. Using /shared/lsf for the value of LSF_TOP as an example, the new files would resemble these: # cat lsf.
Set up the HP XC LSF-HPC with SLURM Startup Script d. The HP XC controllsf command can double as the Red Hat /etc/init.d/ service script for starting LSF-HPC with SLURM when booting the HP XC system and stopping LSF-HPC with SLURM when shutting it down. When starting LSF-HPC with SLURM, the controllsf command establishes the LSF alias and starts the LSF daemons.
1. Preserve the existing environment setup files. a. Change directory to the existing LSF_TOP/conf directory. b. Rename the setup files by appending a unique identifier. For the sample case: # cd /shared/lsf/conf # mv profile.lsf profile.lsf.orig # mv cshrc.lsf cshrc.lsf.orig On installation, LSF-HPC with SLURM provides its own profile.lsf and cshrc.lsf files, rename those files to a unique name and restore these files.
Checking selected tar file(s) ... ... Done checking selected tar file(s). Pre-installation check report saved as text file: /shared/lsf/hpctmp/lsf6.2_lsfinstall/prechk.rpt. ... Done LSF pre-installation check. Installing lsf binary files " lsf6.2_linux2.4-glibc2.3-amd64-slurm"... Copying lsfinstall files to /shared/lsf/6.2/install ... Done copying lsfinstall files to /shared/lsf/6.2/install Installing linux2.4-glibc2.3-amd64-slurm ... Please wait, extracting lsf6.2_linux2.4-glibc2.
... Done creating lsf_quick_admin.html lsfinstall is done. To complete your lsf installation and get your cluster "corplsf" up and running, follow the steps in "/shared/lsf/hpctmp/lsf6.2_lsfinstall/lsf_getting_started.html". After setting up your LSF server hosts and verifying your cluster "corplsf" is running correctly, see "/shared/lsf/6.2/lsf_quick_admin.html" to learn more about your new LSF cluster. A.
XC_LIBLIC=/opt/hptc/lib/libsyslic.so c. Make sure the LSF_NON_PRIVILEGED_PORTS option is disabled or removed from this file ('N' by default). In Standard LSF Version 6.2 this is not supported, and you will get "bad port" messages from the sbatchd and mbatchd daemons on a non-HP XC system node. d. If you use ssh for node-to-node communication, set the following variable in lsf.conf (assuming the ssh keys have been set up to allow access without a password): LSF_RSH=ssh 5. e.
A.8 Starting LSF on the HP XC System At this point lsadmin reconfig followed by badmin reconfig can be run within the existing LSF cluster (on plain in our example) to update LSF with the latest configuration changes. A subsequent lshosts or bhosts displays the new HP XC "node", although it will be UNKNOWN and unavailable, respectively. LSF can now be started on XC: # controllsf start This command sets up the virtual LSF alias on the appropriate node and then starts the LSF daemons.
Example A-3 Running Jobs as a User on an External Node Launching a Parallel Job $ bsub -I -n6 -R type=SLINUX64 srun hostname Job <413> is submitted to default queue . <> <> /home/test/.lsbatch/1113947197.413: line 8: srun: command not found $ bsub -I -n6 -R type=SLINUX64 /opt/hptc/bin/srun hostname Job <414> is submitted to default queue . <
• Examine the output of the ifconfig command on the HP XC LSF node to ensure that the LSF alias was properly established. If eth0 is the external network device, the LSF alias entry is eth0:lsf.
B Installing Standard LSF on a Subset of Nodes This document provides instructions for installing standard LSF on a subset of nodes in the HP XC system; another subset of nodes runs LSF-HPC with SLURM. This situation is useful for an HP XC system that is comprised of two different types of nodes, for example, a set of large SMP nodes (“fat” nodes) running LSF-HPC with SLURM and a set of “thin” nodes running Standard LSF, as Figure B-1 shows.
B.3 Sample Case Consider an HP XC system of 128 nodes consisting of: • • • A head node with a host name of xc128 6 large SMP nodes (or fat nodes) with the host names xc[1-6] 122 thin nodes. 114 of the thin nodes are compute nodes and have host names of xc7-120 B.4 Instructions 1. 2. 3. Log into the head node of the HP XC system as superuser (root). Do not log in though the cluster alias. Change directory to /opt/hptc/lsf/top/conf. Rename the existing setup files: # mv profile.lsf profile.lsf.
# sed -e "s?/etc/hptc-release?/var/slurm/lsfslurm?g" \ < profile.lsf.notxc > profile.tmp # sed -e "s?/etc/hptc-release?/var/slurm/lsfslurm?g" \ < cshrc.lsf.notxc > cshrc.tmp • Verify that only the filename changed: # diff profile.tmp profile.lsf.notxc 120c120 < # Currently we only check for HP-hptc: /var/slurm/lsfslurm --> # Currently we only check for HP-hptc: /etc/hptc-release 127c127 < _slurm_signature_file="/var/slurm/lsfslurm" --> _slurm_signature_file="/etc/hptc-release" # diff cshrc.tmp cshrc.lsf.
lsf_daemons "$1" b. c. Save the file and exit the text editor. Set the file permissions: # chmod 555 /opt/hptc/lsf/etc/slsf d. Create the appropriate soft link to the file: # ln -s /opt/hptc/lsf/etc/slsf /etc/init.d/slsf e. Enable the file: # chkconfig --add slsf # chkconfig --list slsf slsf0:off 1:off 2:off f. 4:on 5:on 6:off Edit the /opt/hptc/systemimager/etc/chkconfig.map file to add the following line to enable this new "service" on all nodes in the HP XC system: slsf 8.
11. Use the sinfo and lshosts commands to verify the SLURM nodes and partitions and LSF hosts, respectively: # sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lsf up infinite 6 idle xc[7-120] # lshosts HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES lsfhost.loc SLINUX6 Itanium2 60.0 228 1973M Yes (slurm) xc1 LINUX64 Itanium2 60.0 8 3456M 6143M Yes () xc2 LINUX64 Itanium2 60.0 8 3456M 6143M Yes () xc3 LINUX64 Itanium2 60.0 8 3456M 6143M Yes () xc4 LINUX64 Itanium2 60.
C Setting Up MPICH MPICH, as described on its Web site, http://www-unix.mcs.anl.gov/mpi/mpich1/, is a freely available portable implementation of MPI. This appendix provides the information you need to set up MPICH on an HP XC system. The topics are as follows: • “Downloading the MPICH Source Files” (page 271) • “Building MPICH on the HP XC System” (page 271) • “Running the MPICH Self-Tests” (page 272) • “Installing MPICH” (page 272) See the HP XC System Software User's Guide for information on using MPICH.
NOTE: Building MPICH may take longer than 2 hours. C.3 Running the MPICH Self-Tests Optionally, you can run the MPICH self-tests with the following command: % make testing Two Fortran tests are expected to fail because they are not 64-bit clean. Tests that use ADIOI_Set_lock() fail on some platforms as well, for unknown reasons. C.4 Installing MPICH You can install MPICH on one shared file system in the HP XC system or on individual file systems for those nodes that will run MPICH jobs.
D HP MCS Monitoring You can monitor the optional HP Modular Cooling System (MCS) by using the Nagios interface. During HP XC system installation, you generated an initialization file, /opt/hptc/config/mcs.ini, which specifies the names and IP addresses of the MCS devices. This file is used in the creation of the /opt/hptc/nagios/etc/mcs_local.cfg file, which Nagios uses to monitor the MCS devices.
a. Issue the following command: # /opt/hptc/config/sbin/mcs_config b. Restart Nagios. For more information, see “Stopping and Restarting Nagios” (page 110). 6. Examine the /opt/hptc/config/mcs_advExpected.static.db file to ensure that the values for the MCS advanced setting are appropriate for your site. Restart Nagios if you changed this file. For more information, see “Stopping and Restarting Nagios” (page 110). 7.
D.4 MCS Log Files The following log files contain MCS-related data collected by the check_mcs_trends plug-in: • /opt/hptc/nagios/var/mcs_trends.staticdb This log file tracks the following: — — — — • tempWaterIn waterFlow lastStatus lastCheck /opt/hptc/nagios/var/env_logs/mcs_trends.
Figure D-1 MCS Hosts in Nagios Service Details Window 276 HP MCS Monitoring
Glossary A administration branch The half (branch) of the administration network that contains all of the general-purpose administration ports to the nodes of the HP XC system. administration network The private network within the HP XC system that is used for administrative operations. availability set An association of two individual nodes so that one node acts as the first server and the other node acts as the second server of a service. See also improved availability, availability tool.
external network node A node that is connected to a network external to the HP XC system. F fairshare An LSF job-scheduling policy that specifies how resources should be shared by competing users. A fairshare policy defines the order in which LSF attempts to place jobs that are in a queue or a host partition. FCFS First-come, first-served.
Integrated Lights Out See iLO. interconnect A hardware component that provides high-speed connectivity between the nodes in the HP XC system. It is used for message passing and remote memory access capabilities for parallel applications. interconnect module A module in an HP BladeSystem server.
MCS An optional integrated system that uses chilled water technology to triple the standard cooling capacity of a single rack. This system helps take the heat out of high-density deployments of servers and blades, enabling greater densities in data centers. Modular Cooling System See MCS. module A package that provides for the dynamic modification of a user's environment by means of modulefiles. See also modulefile.
PXE Preboot Execution Environment. A standard client/server interface that enables networked computers that are not yet installed with an operating system to be configured and booted remotely. PXE booting is configured at the BIOS level. R resource management role Nodes with this role manage the allocation of resources to user applications. role A set of services that are assigned to a node. Root Administration Switch A component of the administration network.
Index Symbols A adding a local user account, 149 adding a node, 243 adding a service, 71 administrative passwords, 152–156 archive.
DHCP service, 27 diagnostic tools dgemm, 221 Gigabit Ethernet system interconnect, 228 gm_drain_test, 223 gm_prodmode_mon, 222 InfiniBand system interconnect, 227 Myrinet system interconnect, 222 ovp, 215 qsdiagadm, 223 qselantestp, 224 qsnet2_drain_test, 227 qsnet2_level_test, 225 Quadrics system interconnect, 223 swmlogger daemon, 223 sys_check, 215 system interconnect, 222 disabling a node, 53 disk monitoring, 85 displaying system statistics, 85–87 distributing file images to nodes, 129 procedure, 129 do
file system hierarchy, 27 log files, 30 shutdown, 52 startup, 49 hpasm, 85 /hptc_cluster directory, 28, 56, 134, 240–241 guidelines, 28 kernel dump analyzing, 98 obtaining, 98 Load Sharing Facility (see LSF) local storage, 26 local user accounts, 149 adding, 149 deleting, 151 general administration, 149 modifying, 150 locatenode command, 31 log files, 30 logging events, 87 logfiles, 28 Nagios log files, 230 login service, 26 LSF switching type of LSF installed, 182 LSF daemon, 178 LSF documentation, 20 LS
restore, 79–80 management hub services, 26 managing licenses, 75 manpages, 22 MCS, 273–276 log files, 275 MCS cluster monitor, 275 MCS device as Nagios host, 275 monitored by Nagios, 273 status, 273 MCS traps monitor, 275 mcs.ini file, 273 mcs_config command, 274 mcs_local.cfg file, 273 mcs_trends.log file, 275 mcs_trends.
nodename command, 31 nrg command, 31 nrg utility, 107, 109, 121–124 nrpe, 229 NTP, 38, 56, 126 O openipport command, 31, 144 /opt directory, 28 /opt/hp directory, 28 ovp utility, 32, 41, 215–220 P pam_slurm, 165 password changing the root password, 152 console, 40 database administrator, 40 Nagios, 40 ProCurve, 40 superuser, 40, 152 pdsh command, 33, 59 per-node service configuration, 134 performance health tests, 218–220 perfplot utility, 32 ports, 31 NFS, 201 open external, 143 open internal, 143 openin
SFS Nagios host, 102, 105 sftp command, 40 shownode command, 32, 37, 52–53, 57–58, 77–79, 88 shownode metrics command, 85–87 si_cpimage command, 137 Simple Linux Utility for Resource Management (see SLURM) sinfo command, 169, 241 single system view, 38 SLURM, 26, 32, 157 configuration files, 37 deactivating, 174 draining nodes, 170 Pluggable Authentication Module, 165 recognizing new node, 173 removing, 174 troubleshooting, 240 SLURM administration, 157 SLURM configuration, 158 nodes, 160 partitions, 161 se
/usr/bin directory, 28 /usr/local directory, 28 /usr/sbin directory, 28 utility clusplot, 31 cluster_config, 31 collectl, 31 dgemm, 221 gm_drain_test, 223 gm_prodmode_mon, 222 netdump, 96 nrg, 107, 109, 121 ovp, 32, 215 perfplot, 32 qsdiagadm, 223 qselantestp, 224 qsnet2_drain_test, 227 qsnet2_level_test, 225 sys_check, 32, 215 xcxclus, 32 xcxperf, 32 V /var directory, 28 verification test, 32 W Web site HP XC System Software documentation, 18 X xcxclus utility, 32 xcxperf utility, 32 289