HP XC System Software Administration Guide Version 3.
© Copyright 2003, 2004, 2005, 2006 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Table of Contents About This Document...................................................................................15 Intended Audience................................................................................................................................15 Document Organization.........................................................................................................................15 HP XC Information...............................................................................
3 Managing System Services HP XC System Services..........................................................................................................................43 Displaying Services Information...............................................................................................................45 Displaying All Services.....................................................................................................................
Understanding the Event Logging Structure..........................................................................................74 The syslog-ng.conf Rules File..............................................................................................................75 7 Network Administration Network Address Translation Administration.............................................................................................77 Network Time Protocol Service...........................................
Propagating Resource Limits............................................................................................................107 Restricting User Access to Nodes...........................................................................................................109 Job Accounting...................................................................................................................................109 Using the sacct Command..........................................................
Using the dgemm Utility to Analyze Performance.....................................................................................151 Using the System Interconnect Diagnostic Tools.......................................................................................152 HP XC Diagnostic Tools for the Myrinet System Interconnect.................................................................152 The gm_prodmode_mon Diagnostic Tool..............................................................................
List of Figures 1-1 LVS View of Cluster..............................................................................................................................30 1-2 HP XC File System Hierarchy..................................................................................................................31 1-3 HP XC Hierarchy Under /opt/hptc.........................................................................................................33 2-1 Transition of Node States........................
List of Tables 1-1 HP XC System Commands.....................................................................................................................26 1-2 Log Files.............................................................................................................................................34 1-3 Recommended Administrative Tasks........................................................................................................36 2-1 Node States.....................................
List of Examples 6-1 Using the collectl Utility from the Command Line.......................................................................................70 12-1 Using a SLURM Feature to Manage Multiple Node Types.......................................................................107 12-2 Draining a Node..............................................................................................................................113 12-3 Taking a Node Out of Service.......................................
About This Document This document describes the procedures and tools that are required to maintain the HP XC system. It provides an overview of the administrative environment and describes administration tasks, node maintenance tasks, Load Sharing Facility (LSF®) administration tasks, and troubleshooting information. An HP XC system is integrated with several open source software components. Some open source software components are being used for underlying technology, and their deployment is transparent.
• Chapter 12.: Managing SLURM (page 101) discusses the Simple Linux Utility for Resource Management (SLURM). • Chapter 13.: Managing LSF (page 117) provides a brief overview of the Load Sharing Facility (LSF) application from Platform Computing, Inc. and describes common LSF administration tasks. • Chapter 14.: Managing Modulefiles (page 137) describes how to manage modulefiles. • Chapter 15.
HP XC Program Development Environment The following URL provides pointers to tools that have been tested in the HP XC program development environment (for example, TotalView® and other debuggers, compilers, and so on): ftp://ftp.compaq.com/pub/products/xc/pde/index.html HP Message Passing Interface HP Message Passing Interface (MPI) is an implementation of the MPI standard for HP systems. The home page is located at the following URL: http://www.hp.
• To view LSF HPC manpages supplied by Platform Computing Corporation, use the following command: $ man lsf_command_name • The LSF HPC Administrator and Reference guides developed by Platform are also available at: http://www.hp.com/techservers/clusters/xc_clusters.html • http://www.llnl.gov/LCdocs/slurm/ Home page for the Simple Linux Utility for Resource Management (SLURM), which is integrated with LSF to manage job and compute resources on an HP XC system. • http://www.nagios.
Using the discover(8) manpage as an example, you can use either of the following commands to display a manpage: $ man discover $ man 8 discover If you are not sure about a command you need to use, enter the man command with the -k option to obtain a list of commands that are related to the keyword. For example: $ man -k keyword Related Information This section provides pointers to the Web sites for related software products and provides references to useful third-party publications.
Related Compiler Web Sites • http://www.intel.com/software/products/compilers/index.htm Web site for Intel® compilers. • http://support.intel.com/support/performancetools/ Web site for general Intel software development information. • http://www.pgroup.com/ Home page for The Portland Group™, supplier of the PGI® compiler.
IMPORTANT NOTE This alert provides essential information to explain a concept or to complete a task A note contains additional information to emphasize or supplement important points of the main text. HP Encourages Your Comments HP encourages your comments concerning this document. We are committed to providing documentation that meets your needs. Send any errors found, suggestions for improvement, or compliments to: feedback@fc.hp.
1. HP XC Administration Environment This chapter introduces the HP XC Administration Environment and discusses key components.
Other nodes are assigned services. Following are examples of services: • Compute Compute nodes run the Simple Linux Utility for Resource Management (SLURM). SLURM enables them to launch, monitor, and control jobs. By default, all nodes in the system are compute nodes. • SLURM central control daemon service The node that provides this service runs the daemon for the SLURM central control.
This service assigns IP addresses for all devices on the internal network based on the DHCP server node's Machine Access Control (MAC) address. • Power daemon service The power subsystem daemon enables you to turn on, turn off, cycle, and reset power on individual nodes, and also report whether a node's power is on or off. This daemon is automatically assigned to the head node. For additional information on nodes and services, see the HP XC System Software Installation Guide.
Table 1-1. HP XC System Commands HP XC System Commands Command Description cexec The cexec command is a shell script that invokes the pdsh command to perform commands on multiple node in the system. Manpage: cexec(1) cluster_config The cluster_config command enables you to view and modify the default role assignments and node configuration, modify the default role assignments on any node, and add, modify, or delete Ethernet connections to any node except the head node.
Command Description shownode This command has a variety of subcommands that provide information about nodes. Manpage: shownode(1) ssh_create_shared_keys The ssh_create_shared_keys command, used on a one-time basis, updates the appropriate ssh key files in the user's $HOME/.ssh directory so they do not need to provide a login name and password each time they log in to another node in the HP XC system or run jobs on those nodes.
pdsh -[options] " command ... " Important Do not pass a command that requires interaction as an argument to the pdsh command. Prompting from the remote node can cause the command to hang. The following example runs the uptime command on all the nodes in a four-node system. # pdsh -a "uptime" n4: 15:51:40 up 2 n3: 15:49:17 up 1 n2: 15:50:32 up 1 n1: 15:47:21 up 1 days, 2:41, 4 users, load average: 0.48, 0.29, 0.11 day, 4:55, 0 users, load average: 0.00, 0.00, 0.00 day, 4:55, 0 users, load average: 0.
Understanding the Configuration and Management Database The HP XC system stores information about the nodes and system configuration in the configuration and management database (cmdb). This is a MySQL database that runs on the node with the node management role. The cmdb is constructed during HP XC system installation. It contains data on hardware and software configuration and on the connectivity of the system.
Figure 1-1. LVS View of Cluster LVS View of Cluster HP XC Cluster Single System View of Cluster LVS User Network Time Protocol One node in an HP XC system acts as the internal Network Time Protocol (NTP) server for all the other nodes. By default this is the head node. All other nodes are NTP clients of this server. You can specify where the internal NTP server gets the time. You can specify up to three external time sources if the internal server has a connection to an outside network.
• Set all the cluster nodes as NIS clients. Both the master and slave NIS server are external to the cluster. • Set the head node as a NIS slave (secondary) server. The NIS master server is external to the cluster. Nodes within the cluster use the internal server for NIS information. HP recommends this configuration for larger systems using NIS. Local Storage Local storage for each node holds the operating system, a copy of the HP XC system software, and temporary space that can be used by jobs.
This directory contains two separate subdirectories, one for HP XC System Software and one for other HP software products. Make individual directories for each vendor or software product. /opt/hp /usr /usr/bin /usr/sbin /usr/local /var /tmp Is reserved for optional HP applications and utilities that are not related to the HP XC system. HP Serviceguard for Linux is an example of such a package. Maintains a hierarchy of standard commands and files. Holds the binary executable files that any user can invoke.
Figure 1-3. shows the hierarchy under the /opt/hptc directory. Software packages are shown in boxes with dashed lines. The example directory below the packages contains symbolic links to /opt/hptc/bin, /opt/hptc/sbin, /opt/hptc/lib, and /opt/hptc/man, and are shown in grey. Figure 1-3.
Table 1-2. Log Files Log Files Component Path Name of Log File Console Management Facility (CMF) /hptc_cluster/adm/logs/cmf.dated/* collectl utility /var/log/collectl LSF-HPC /var/lsf/log (linked to /opt/hptc/lsf/top/log) Myrinet® gm_drain_test /var/log/diag/myrinet/gm_drain_test/ Myrinet gm_prodmode_mon diagnostic /var/log/diag/myrinet/gm_prodmode_mon/links.log tool /hptc_cluster/adm/logs/aggregator_nodename.
# module avail • Use the unload keyword to unload a module: # module unload package-name Refer to the HP XC System Software User's Guide for more information about modulefiles. Notes Installing a package in a nondefault location means that you must update the corresponding modulefile; you might need to edit the PATH and MANPATH environment variables. Other changes are based on the software package and its dependencies.
Table 1-3. Recommended Administrative Tasks Recommended Administrative Tasks When Task Reference Once, after initial installation and configuration Create a system log book for monitoring configuration changes to your system. No reference available Run the ovp command. Chapter 6.: “Monitoring the System” (page 61) Run the sys_check command to establish a baseline. Run the dgemm command to detect any nodes that are not performing at their peak performance.
2. Starting Up and Shutting Down the HP XC System This chapter describes the information and procedures for starting up and shutting down an HP XC system.
Table 2-1. Node States Node States Node State Description Raw_Off The node requires imaging and is powered off. Turning on the node's power initiates the imaging process. The node requires imaging and is powered on. Raw_On The node must be powered off to prepare for imaging. The node is powered on and should have entered BIOS self-test. POST The node can be imaged or booted. The node is in the process of imaging. Imaging After a successful imaging, the node is placed in the Boot_Ready state.
Starting Up the HP XC System The startsys command implements system startup by booting nodes in the sequence essential for the system to function. The startsys command boots the system nodes in two groups: • Nodes that provide essential system services. This group is booted first. • All other nodes.
2. Invoke the startsys command with the --image_and_boot option: # startsys --image_and_boot For additional information on options that affect imaging, see startsys(8). Restarting a Node for Imaging The following procedure describes how to image and boot a node that is already running. In this instance, the node needs to be imaged to accommodate a recent update to the golden image. 1. 2. Log in as superuser (root) on the head node.
You can request the power status of all nodes by omitting the nodelist parameter. This returns all the nodes whose status is either ON or OFF: # shownode status . . . n2 ON n1 ON Locating a Given Node Most nodes supported on the HP XC system have a Unit Identifier LED that can be lit; not all nodes have this feature. Follow this procedure to light the Unit Identifier LED on a node with this feature, given its nodename: 1. 2. Log in as superuser.
3. Managing System Services This chapter describes the HP XC system services and the procedures for their use. This chapter addresses the following topics: • HP XC System Services (page 43) • Displaying Services Information (page 45) • Restarting a Service (page 47) • Stopping a Service (page 47) • Adding a Service (page 47) • Global System Services (page 53) HP XC System Services A service is a process that monitors a specific condition but is usually otherwise inactive.
Table 3-1. HP XC System Services HP XC System Services 44 Service Function Database Name Console Management Facility Master Communicates with node consoles. cmfd Database Server Runs the management database server. dbserver DHCP Server Assigns IP addresses to nodes. dhcp Myrinet Switch Monitor Monitors the Myrinet switch. gmmon Hardware Information Gathering Gathers information about hardware for the management database.
Management hubs are used to aggregate system information for a set of local nodes. Each management hub runs the following services: • Supermon aggregator (supermond) • Syslog-ng aggregator (syslogng_forward) • Console Management Facility (CMF) • Nagios (nagios_monitor) “Displaying Services Information” (page 45) describes commands that provide information on services, that is, how to display a list of services, which services are provided on a given node, and which nodes provide a given service.
You can obtain an extensive list of all services running on a given node by invoking the following command: # service --status-all Displaying the Nodes That Provide a Specified Service You can use the shownode servers command to display the node or nodes that provide a specific service to a given node. You do not need to be superuser to use this command.
nagios_monitor: n3 nsca: n3 ntp: n3 slurm: n3 supermond: n3 syslogng_forward: n3 The shownode services node server command displays no output if no server exists. For more information, see shownode(8). Restarting a Service Use the following command line syntax to restart an individual service on the current node: # service servicename restart Note Some services do not provide a restart option. For such services, you must issue a service stop command followed by a service start command.
[global] roles = <
external_connection_desired = value exclusivity_desired = value These parameters are explained as follows: service_recommended Assigning 1 to this parameter indicates that the service is part of the default configuration. The cluster_config utility recommends the assignment of nodes to this service. Alternately, assigning 0 means that the cluster_config utility does not interpret the service as part of the default configuration. optimal_scale_out Assign the scale-out value to this parameter.
external_connection_desired Assigning 1 to this parameter indicates that an external connection is beneficial, but not necessary. Assigning 0 indicates the service does not require an external connection. IMPORTANT Ensure that you do not assign 1 to both the external_connection_required and external_connection_desired parameters. If you assign 1 to the external_connection_desired, assign 0 to the external_connection_required parameter.
An N in these columns indicates a note; a W, indicates a warning. When notes and warnings are present, more detailed descriptions are displayed immediately after the table. Adding a Service to an Existing Role IMPORTANT Perform this procedure only when you have exclusive use of the HP XC system and when no jobs are running on the system because the procedure requires that you reimage nodes. This procedure adds a service called newservice to the common role: 1. 2. Log in as the superuser on the head node.
You may need to use other options to the discover command, depending on your system. See discover(8). 10. Run the cluster_config utility: # /opt/hptc/config/sbin/cluster_config Be sure to use the analyze command and review the output. Ensure that the service's role is assigned the correct number of servers. If the assigned number of servers is insufficient, edit the service.ini file accordingly and repeat from step 7. 11. Reimage the applicable node or nodes. This is described in Chapter 8.
6. Update the configuration and management database: # reset_db 7. Restart and initialize the cmdb with the cluster_prep command: # /opt/hptc/config/sbin/cluster_prep 8. Execute the discover command to discover network components: # /opt/hptc/config/sbin/discover --system Note You may need to use other options to the discover command, depending on your system. See discover(8). 9.
4. Managing Licenses This chapter describes the following topics: • License Manager and License File (page 55) • Determining If the License Manager Is Running (page 55) • Starting and Stopping the License Manager (page 55) License Manager and License File The license manager service runs on the head node and maintains licensing information for software on the HP XC system. You can find additional information on the FLEXlm license manager at the Macrovision Web site: http://www.macrovision.
# service hptc-lm start Stopping the License Manager Use the following command to stop the license manager: # service hptc-lm stop Restarting the License Manager Use the following command to restart the license manager: # service hptc-lm restart 56 Managing Licenses
5. Managing the Configuration and Management Database The configuration and management database, cmdb, is key to the configuration of the HP XC system. It keeps track of which nodes are enabled or disabled, the services that a node provides, the services that a node receives, and so on.
hwaddr: ipaddr: level: location: netmask: region: cp-n2: cp_type: host_name: hwaddr: ipaddr: level: location: netmask: region: 00:e0:8b:01:02:03 172.21.0.1 1 Level 1 Switch 172.20.65.2, Port 40 255.224.0.0 0 IPMI cp-n2 00:e0:8b:01:02:04 172.21.0.2 1 Level 1 Switch 172.20.65.2, Port 41 255.224.0.0 0 . . .
This command output indicates that the node n3 supplies the ntp service for itself. Finding and Setting System Attribute Values The dbsysparams command finds the value of a specified attribute or sets an attribute to a given value in the hptc_system table of the cmdb. CAUTION Do not reset any database parameters. Doing so could interfere with the integrity of the database. If this happens, you must reinstall the HP XC system and any additional software.
Time Parameter Archive (or Purge) All Metrics Data: now Regardless of age (default) nh Older than n hours nd Older than n days nw Older than n weeks ny Older than n years The following example archives metrics data older than four days to an archive file named arfile_tuesday in the current directory: # managedb archive --archive ./arfile_tuesday 4d Be sure to archive the configuration and management database on a regular basis (at least monthly) to save disk space.
6. Monitoring the System System monitoring can identify situations before they become problems. This chapter addresses the following topics: • Monitoring Strategy (page 61) • Monitoring Tools (page 61) • Displaying System Environment Data (page 71) • Displaying System Statistics (page 72) • Logging Node Events (page 74) Monitoring Strategy The HP XC system monitoring strategy is built on Open Source tools that are configured automatically to provide a seamless integration with the HP XC system.
• top • uptime • vmstat • w You can use these administrative commands from any node to determine the health of an individual node. Information for these commands is available from their corresponding manpages. The HP XC system also includes the Nagios Web-based utility for system monitoring and such commands as the shownode command. These are discussed in this chapter.
Table 6-1. Services Monitored by Nagios Services Monitored by Nagios Service Function Apache HTTPS Server Monitors the Web server providing the Nagios Web interface Configuration Monitor Periodically generates and updates configuration display information for all nodes in the HP XC system (see “configuration” below) configuration Configuration information reported for this node Environment Report on this node's sensor status. Depending on the node type, all available “live” sensors are reported.
Figure 6-2. Nagios Main Window Nagios Main Window You can choose any of the options on the left navigation bar. These options are shown in Figure 6-3..
Figure 6-3. Nagios Menu (Truncated) Nagios Menu (Truncated) After you chose an option from the window you are prompted for a login and a password. This login and password are established when the HP XC system is configured. Usually, the login name is nagiosadmin. The Nagios passwords are maintained in the /opt/hptc/nagios/etc/htpasswd.users file. Use the htpasswd command to manipulate this file to add a user, to delete a user, or to change the user password. Nagios offers various views of the HP XC system.
• Tactical Overview • Status Map • Service Detail • 3-D Status Map • Host Detail • Service Problems • Hostgroup Overview • Host Problems • Hostgroup Summary • Network Outages • Hostgroup Grid • Downtime • Servicegroup Overview • Process Info • Servicegroup Summary • Performance Info • Servicegroup Grid • Scheduling Queues For administrators of systems comprised of tens of nodes, the Service Detail view provides a good overview of the system..
Figure 6-5. Nagios Service Problems View Nagios Service Problems View Both views offer color coding so that you can detect problems at a glance. Note The term Hosts on the Nagios window refers to any entity with an IP address, not just nodes. For example, Nagios monitors the four nodes and two switches in an HP XC system, and reports on the status of six hosts. SFS is also an example of a Nagios host; Nagios finds the name of the SFS server and displays its data.
Nan Notification Aggregator and Delimiter The HP XC System Software incorporates the Nan notification aggregator and delimiter for the Nagios paging system. Nan is an open source supplement to the Nagios application. Nagios is capable of sending quantities of messages especially when the system is starting up, shutting down, or experiencing a failure.
The syslogng_forward service on each regional node enables the node to act as a log aggregator for the global node. Log information is gathered, consolidated, and forwarded to the global node; the global node is not necessarily the head node. Nagios has a syslog plug-in , check_syslogAlerts, that applies a set of rules against all the events in the consolidated log file and generates alerts for those events that match one of the rules. The rules reside in the /opt/hptc/nagios/etc/syslogAlertRules.
Example 6-1. Using the collectl Utility from the Command Line Using the collectl Utility from the Command Line # collectl waiting for 10 second sample... ### RECORD 1 >>> n3 <<< (m.n) (date and time stamp) ### # CPU SUMMARY (INTR, CTXSW & PROC /sec) # USER NICE SYS IDLE WAIT INTR CTXSW PROC RUNQ RUN AVG1 AVG5 AVG15 0 0 0 99 0 1055 65 0 151 0 0.02 0.04 0.
By default, the collectl service gathers information on the following subsystems: • CPU • Disk • Inode and file system • Lustre file system • Memory • Networks • Sockets • TCP • Interconnect The collectl(1) manpage discusses running the collectl utility as a service. Running the collectl Utility in a Batch Job Submission You can run the collectl utility as one job in a batch job submission.
For additional information about the Nagios Service Status for All Hosts window and related topics, select Documentation on the Nagios menu, or visit the Nagios Web site: http://www.nagios.org . You can also use the shownode metrics sensors command to display environmental data. See “Displaying System Sensors from the Command Line” (page 72) for more information. Depending on the platform, there may be tools that allow you to collect information specific to the platform.
date time date time date time date time date time |n512 |n511 |n510 |n509 |n1 |Sensor |Sensor |Sensor |Sensor |Sensor count count count count count |22 |22 |22 |25 |22 |All |All |All |All |All sensors sensors sensors sensors sensors within within within within within thre... thre... thre... thre... thre...
System Monitoring with the Nagios GUI The Nagios GUI displays a series of windows that provide system statistics.
The syslog-ng.conf Rules File The syslog-ng.conf rules file defines the order of importance by which the log files are arranged. The /opt/hptc/syslog-ng/etc/syslog-ng.conf/syslog-ng.conf file defines for the syslogng_forward service a series of rules on how to handle messages from its clients. The syslog-ng.conf file contains five types of rules: Options Sources Filters Destinations Logs Defines generic information like reconnection timeouts, FIFO size limits, and so on.
7. Network Administration This chapter addresses the following network topics: • Network Address Translation Administration (page 77) • Network Time Protocol Service (page 78) Network Address Translation Administration Network Address Translation (NAT) is administered on the head node only when one or two nodes are configured with the external role as NAT servers. When three or more nodes (other than the head node) are configured as NAT servers, these nodes relieve the head node of NAT duty.
This command flushes each client node's IP routing cache and generates traffic on its primary default gateway (its preferred server) in an attempt to influence the selected default gateway. NOTE: This implementation of NAT prevents the use of NFS locking. The GATEWAY= field of the /etc/sysconfig/network file, which provided network configuration in previous releases of the HP XC System Software, is no longer used. Use the ip route show command to display a node's routing information.
8.
Adding Software or Modifying Files on the Golden Client The first step in managing software changes to your HP XC system is to update the golden client node. This can involve adding new software packages, adding a new user, or modifying a configuration file that is replicated across the HP XC system, such as a NIS or NTP configuration file. Note It is important to have a consistent procedure for managing software updates and changes to your HP XC system.
1. Create an “overrides” subdirectory, named compiler, to contain the compiler package: # cd /var/lib/systemimager/overrides # mkdir ./compiler There should be two subdirectories, base_image and compiler, under the ./overrides directory. # # ls -F base_image/ 2. compiler/ README Install the compiler package into the alternate root location, /var/lib/systemimager/overrides/compiler. # rpm -ivh --root \ /var/lib/systemimager/overrides/compiler compilername.rpm 3.
Using Per-Node Service Configuration The HP XC system configuration process uses per-node configuration scripts to achieve personalized role configurations as necessary on each node. The per-node configuration process occurs initially during HP XC system configuration, at the time each client node is auto-installed.
# tail -f /hptc_cluster/adm/logs/imaging.log The per-node configuration scripts log their execution locally to the /var/log/nconfig.log file.
golden_image_md5sum=2e22d69e5c8b0bc0570b0dfffe5883a1 golden_image_modification_time=date and time stamp Updating the Golden Image Before you can deploy your software and configuration updates throughout the HP XC system, you must update the golden image to synchronize with these changes. The golden image that is created during the initial HP XC system configuration process is named base_image, and it exists in the file system hierarchy under the directory /var/lib/systemimager/images.
When the cluster_config command completes, the golden image is synchronized with the golden client. You are ready to deploy the golden image to all the nodes in your HP XC system. Note Nodes that have had their configuration changed are set to network boot. This causes the nodes to reinstall themselves automatically, thus receiving the latest golden image. For nodes to be set to network boot, the nodes must be operational.
Used during the initial creation of the golden image, which occurs as a result of executing the cluster_config command. The golden client has very little personality at this time, so this exclude file is fairly sparse. After the initial golden image is created from the golden client, the golden client is configured, and it takes on its node-specific personality. Any subsequent update to the golden image should exclude those node-specific files from contaminating the golden image.
Using the Full Imaging Installation A recommended procedure to propagate the golden image is to install all client nodes automatically. This ensures that they receive the updated image and any updated configuration information automatically. When all nodes are set to network boot, a reboot of each client node starts the automatic installation. After each node completes its installation, it automatically reboots and is available for service.
# service nconfig nconfigure # service nconfig restart 6. Run the following commands to run the per-node configuration service on all the client nodes: # cexec -x `nodename` -a "service nconfig nconfigure" # cexec -x `nodename` -a "service nconfig restart" Using the cexec Command You can use the cexec command to copy designated files to client nodes.
6. 7. Update the /opt/hptc/systemimager/etc/chkconfig.map file by adding the new service names, with the service orientation OFF for all run levels. The nconfig, the configuration script, or both, enable the service only on those nodes on which they are intended to run. Execute all gconfig scripts and update the golden image: # /opt/hptc/config/sbin/cluster_config 8.
9. Opening an IP Port in the Firewall This chapter addresses the following topics: • Open Ports (page 91) • Opening Ports in the Firewall (page 92) Open Ports Each node in an HP XC system is set up with an IP firewall, for security purposes, to block communications on unused network ports. External system access is restricted to a small set of externally exposed ports. Table 9-1. lists the base ports that are always open by default; these ports are labeled “External”.
Table 9-2.
The following example opens port 44 in the firewall for the udp protocol on the Admin, Interconnect, and loopback interfaces on the current node. The --verbose option displays error messages, if any. Notes The commands in the following examples use line continuation with the backslash character (\) to fit the commands horizontally on the page. You can enter these commands on one line. The list of interfaces specified by the --interface option must not contain any space characters.
# set up port 389 on Interconnect interface: -A RH-Firewall-1-INPUT -i Interconnect -p tcp -m tcp --dport 389 -j ACCEPT # setup port 389 on admin interface -A RH-Firewall-1-INPUT -i Admin -p tcp -m tcp --dport 389 -j ACCEPT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited 3. Optionally, enter the following command to open the port on all nodes in the HP XC system until the nodes are reimaged: Note This command was entered using the backslash character (\) to continue it on another line.
10. Connecting to a Remote Console This chapter addresses the following topics: • Console Management Facility (page 95) • Accessing a Remote Console (page 95) Console Management Facility The Console Management Facility (CMF) daemon runs on the head node. It collects and stores console output for all other nodes on the system. This information is stored individually for every node and is backed up periodically. This information is stored under dated directories in the /opt/hptc/cmf/logs/cmf.
logging the console data for the node. If command-line mode is entered, use the key sequence to exit it to restore console data logging by CMF. Table 10-1.
11. Managing Local User Accounts This chapter describes how to add, modify, and delete a local user account on the HP XC system.
Note A customary practice is to assign a temporary password that the user changes with the passwd command, but this data must be propagated to all the other system nodes also. See “Distributing Software Throughout the System” (page 79) for more information. • User identifier number (UID) A default value is assigned if you do not supply this information; this value is usually the next available UID. If you assign a value, first make sure that it is not already in use.
Deleting a Local User Account Remove a user account with the userdel command; you must be superuser on the golden client node to use this command. This command provides the -r option, which removes that user's home directory, all the files in that directory, and the user's mail spool. Make sure that you propagate these changes to all the other nodes in the system by using the si_getimage and si_updateclient commands, as described in “Distributing Software Throughout the System” (page 79).
1. 2. Log in as superuser on the head node. Make a copy of the root crontab file into a temporary file named /tmp/root_crontab: # crontab -l >/tmp/root_crontab 2>/dev/null 3. # 20 40 55 Use the text editor of your choice as follows: a. Open the temporary file. b. Append the following lines to the temporary file: Download * * * 6 * * 6,18 * * c. 4. NIS maps according to update frequency.
12. Managing SLURM The HP XC system uses the Simple Linux Utility for Resource Management (SLURM).
Configuring SLURM The HP XC system provides global and local directories for SLURM files: • The /hptc_cluster/slurm directory is the sharable location for SLURM files that need to be shared between the nodes. The SLURM slurmctld state files, job logging files, and the slurm.conf configuration file reside there. • The location for SLURM files that should remain local to a given node is /var/slurm; the files in this directory are not shared between nodes.
The following general parameters are configured: • The MaxJobCount parameter is based on the number of CPUs in the HP XC system and the number of preemption queues to be used in LSF to ensure that allocations are available for LSF jobs. The default value is 2000 jobs. • The MinJobAge is set to a value (1 hour or greater) that provides LSF enough time to obtain job status information after it finishes. The default value is 300 seconds. A value of zero prevents any job record purging.
Configuring Nodes in SLURM You can change the configuration of a set of nodes by editing the slurm.conf file. SLURM enables you to describe various node characteristics on your system. SLURM uses this description to enable your users to select an optimal set of nodes for their jobs. Node Characteristics The following characteristics are useful on an HP XC system: Feature RealMemory Alphanumeric text with meaning in the local environment.
Your HP XC system is configured initially with all compute nodes in a single SLURM partition, called lsf. In some situations, you might want to remove some nodes from the lsf partition and manage them directly with SLURM, submitting jobs to those nodes with the srun --partition=partition-name command. LSF manages only one partition.
However, you might prefer not to run jobs on the head node, n128. Simply modify the line to the following: PartitionName=lsf RootOnly=yes Shared=FORCE Nodes=n[1-127] Consider an academic system with 256 nodes. Suppose you would like to allocate half the system for faculty use and half for student use. Furthermore, the faculty prefers the order and control imposed by LSF, while the students prefer to use the srun command.
Example 12-1. Using a SLURM Feature to Manage Multiple Node Types Using a SLURM Feature to Manage Multiple Node Types 1. Use the text editor of your choice to edit the slurm.conf file to change the node configuration to the following: NodeName=exn[1-64] Procs=2 Feature=single,compute NodeName=exn[65-96] Procs=4 Feature=dual,compute NodeName=exn[97-98] Procs=4 Feature=service Save the file. 2. Update SLURM with the new configuration: # scontrol reconfig 3.
cpu time max user processes virtual memory file locks (seconds, -t) (-u) (kbytes, -v) (-x) unlimited 8113 unlimited unlimited Only soft resource limits can be manipulated. Soft and hard resource limits differ.
. . Can't propagate RLIMIT_CORE of 100000 from submit host. For more information, see slurm.conf(5). Restricting User Access to Nodes Although full user authentication is required on every node so that SLURM can launch jobs, and although this access is beneficial for users who need to debug their applications, it can be a problem because one user could adversely affect the performance of another user's job: a user could log on any compute node and steal processor cycles from any job running on that node.
• First nonzero error code returned by jobstep • Sum of system processor time and user processor time Note These statistics are gathered after each jobstep completes. Using the sacct Command The sacct command enables you to analyze the system accounting data collected in the job accounting log file. As the superuser, you can examine the accounting data for all jobs and job steps recorded in the job accounting log.
c. Verify that this portion of the slurm.conf file resembles the following (the changes are shown in bold): . . . # # o Define the job accounting mechanism # JobAcctType=jobacct/none # # o Define the location where job accounting logs are to # be written. For # - jobacct/none - this parameter is ignored # - jobacct/log - the fully-qualified file name # for the data file # JobAcctLoc=/hptc_cluster/slurm/job/jobacct.log . . . d. 3. Save the file.
You can set this parameter to a value indicating the polling interval (in seconds). The majority of the jobs run should have longer run times than this value. The default value is 30. Setting this value to 0 causes the sacct command always to report values of 0 for psize and vsize. 4. Use the text editor of your choice to edit the /hptc_cluster/slurm/etc/slurm.conf file as follows: a. Locate the parameter JobAcctType: JobAcctType= b.
# for the data file # JobAcctLoc=/hptc_cluster/slurm/job/jobacct.logJobAcctParameters="Frequency=10" . . . g. 5. Save the file. Restart the slurmctld and slurmd daemons: # cexec -a "service slurm restart" Monitoring SLURM The SLURM squeue, sinfo, and scontrol utilities and the Nagios system monitoring utility provide the means for monitoring and controlling SLURM on your HP XC system.
Example 12-3. Taking a Node Out of Service Taking a Node Out of Service # scontrol update nodename=n17 state=down The scontrol command returns nodes to an IDLE state so that they can be reused. Example 12-4. places n17 in the IDLE state to return it to service: Example 12-4. Returning a Node to Service Returning a Node to Service # scontrol update NodeName=nodelist State=idle When returning a node to service, HP recommends that you set the state to DRAIN, even if no jobs are currently running.
Table 12-4.
EPILOG_EXCLUDE_NODES="n101 n102 n103 n104 n105" The SLURM epilog is located at /opt/hptc/slurm/etc/slurm.epilog.clean initially. You can maintain the file in this directory, move it to another directory, or move it to a shared directory. If you decide to maintain this file in a local directory on each node, be sure to propagate the SLURM epilog file to all the nodes in the HP XC system. The following example moves the SLURM epilog file to a shared directory: # mv /opt/hptc/slurm/etc/slurm.epilog.
13. Managing LSF The Load Sharing Facility (LSF) from Platform Computing Corporation is a batch system resource manager used on the HP XC system. LSF is an integral part of the HP XC environment. On an HP XC system, a job is submitted to LSF, which places the job in a queue and allows it to run when the necessary resources become available. In addition to launching jobs, LSF provides extensive job management and information capabilities.
each node. Then the and then creating the /etc/profile.d/lsf.sh and /etc/profile.d/lsf.csh files that reference the appropriate source file upon login are created. Finally, LSF is configured to start when the HP XC boots up. A soft link from /etc/init.d/lsf to the lsf_daemons startup script provided by LSF is created. All this configuration optimizes the installation of LSF on HP XC. The following LSF commands are particularly useful: • The bhosts command is useful for viewing LSF batch host information.
• To signal user jobs and cancel allocations. • To gather user job accounting information. The major difference between LSF-HPC and Standard LSF is that LSF-HPC daemons run on only one node in the HP XC system, that node is known as the LSF execution host. The LSF-HPC daemons rely on SLURM to provide information on the other computing resources (nodes) in the system.
For example, consider an LSF-HPC configuration in which node n20 is the LSF-HPC execution host and nodes n[1-10] are in the SLURM lsf partition. The default normal queue contains the job starter script, but the unscripted queue does not have the job starter script configured. Example 13-2. Comparison of Queues and the Configuration of the Job Starter Script Comparison of Queues and the Configuration of the Job Starter Script $ bqueues -l normal | grep JOB_STARTER JOB_STARTER: /opt/hptc/lsf/bin/job_starter.
SLURM lsf Partition An lsf partition is created in SLURM; this partition contains all the nodes that LSF-HPC manages. This partition must be configured such that only the superuser can make allocation requests (RootOnly=YES). This configuration prevents other users from directly accessing the resources that are being managed by LSF-HPC. The LSF-HPC daemons, running as the superuser, make allocation requests on behalf of the owner of the job to be dispatched.
Installation of LSF-HPC on SLURM When selected, LSF-HPC is automatically installed during cluster_config execution. This installation is optimized for operational scalability and efficiency within the HP XC system, and is a very good solution for the HP XC system. Depending how you manage your overall LSF cluster file system, this installation is sufficient for adding the HP XC system to an existing LSF cluster.
/opt/hptc/lsf/top/conf/hosts file maps lsfhost.localdomain and its virtual IP to the designated LSF execution host • An initial LSF-HPC hosts file to map the virtual host name (lsfhost.localdomain) to an actual nodename is provided. • Sets the default LSF-HPC environment for all users who log into the HP XC system. Files named lsf.sh and lsf.csh are added to the /etc/profile.d/ directory; these files source the respective /opt/hptc/lsf/top/conf/profile.lsf and /opt/hptc/lsf/top/conf/cshrc.lsf files.
service lsf status This command reports the current state (UP or DOWN) of LSF-HPC. This command has the same function as controllsf status. Load Indexes and Resource Information LSF-HPC gathers limited resource information and load indexes from the LSF execution host and from its integration with SLURM. Not all indexes are reported because SLURM does not provide the same information that LSF-HPC usually reports.
Launching Jobs with LSF-HPC You may not submit LSF-HPC jobs as superuser (root). You may find it convenient to run jobs as the local lsfadmin user. An example would be a job to test a new queue configuration. The LSF-HPC daemons run on one node only: the LSF execution host. Therefore, they can dispatch jobs only on that node.
Example 13-7. Basic Job Launch with the JOB_STARTER Script Configured Basic Job Launch with the JOB_STARTER Script Configured $ bsub -I hostname Job <24> is submitted to default queue . <> <> n99 Monitoring and Controlling LSF-HPC Jobs All the standard LSF commands for monitoring a job are supported. The bjobs command reports the status of a job.
The bstop command suspends the execution of a running job. The bresume command resumes the execution of a suspended job. For more information, see bkill(1), bstop(1), and bresume(1). Job Accounting Standard LSF job accounting using the bacct command is available. The output of a job contains total CPU time and memory usage: $ cat 231.out . . . Resource usage summary: CPU time : Max Memory : Max Swap : 8252.65 sec. 4 MB 113 MB . . .
LSF-HPC monitoring and failover are implemented on the HP XC system as tools that prepare the environment for the LSF-HPC execution host daemons on a given node, start the daemons, then watch the node to ensure that it remains active. After a standard installation, the HP XC system is initially configured so that: • LSF-HPC is started on the head node. • LSF-HPC failover is disabled. • The Nagios application reports whether LSF-HPC is up, down, or "currently shut down," but takes no action in any case.
controllsf set primary nodename Specifies that LSF-HPC should start on some node other than the head node by default. You can also change the selection of the primary and backup nodes for the SLURM control daemon by editing the SLURM configuration file, /hptc_cluster/slurm/etc/slurm.conf. LSF-HPC Failover and Running Jobs In the event of an LSF-HPC failover, LSF-HPC terminates each job that was previously running. These jobs finish with an exit code of 122.
LSF-HPC Enhancement Settings Table 13-3. describes the environment variables in the lsf.conf file that you can use to enhance LSF-HPC.
Table 13-3. Environment Variables for LSF-HPC Enhancement (lsf.conf File) LSF-HPC Enhancement (lsf.conf File) Environment Variables for Environment Variable Description LSB_RLA_PORT=port_number This entry specifies the TCP port used for communication between the LSF-HPC allocation adapter (RLA) and the SLURM scheduler plug-in. The default port number is 6883.
Environment Variable Description LSF_HPC_EXTENSIONS="ext_name,..." This setting enables Platform LSF-HPC extensions. This setting is undefined by default. The following extension names are supported: • SHORT_EVENTFILE This compresses long host name lists when event records are written to the lsb.events and lsb.acct files for large parallel jobs. The short host string has the format: number_of_hosts*real_host_name When SHORT_EVENTFILE is enabled, older daemons and commands (pre-LSF Version 6.
Environment Variable Description LSF_NON_PRIVILEGED_PORTS=Y|y Some LSF-HPC communication can occur through privileged ports. This setting disables privileged ports usage ensuring that no communication occurs through privileged ports. Disabling privileged ports helps to ensure system security. By default, LSF daemons and clients running under the root account use privileged ports to communicate with each other. If LSF_NON_PRIVILEGED_PORTS is undefined, and if LSF_AUTH is not defined in lsf.
Table 13-4. Environment Variables for LSF-HPC Enhancement (lsb.queues File) for LSF-HPC Enhancement (lsb.queues File) Environment Variables Environment Variable Description DEFAULT_EXTSCHED= SLURM[options[;options]...] This entry specifies SLURM allocation options for the queue. The -ext options to the bsub command are merged with DEFAULT_EXTSCHED options, and -ext options override any conflicting queue-level options set by DEFAULT_EXTSCHED.
Thresholds in LSF-HPC-SLURM Interplay When the HP XC system starts, some computer nodes may take a while to boot. If LSF-HPC starts to report the current number of processors before the system stabilizes, the smaller jobs that are already queued are scheduled. It may be better to run a larger job requesting more processors.
# controllsf set virtual hostname xclsf # controllsf show LSF is currently shut down, and assigned to node . Failover is disabled. Head node is preferred. The primary LSF host node is n128. SLURM affinity is enabled. The virtual hostname is "xclsf". 6. 7. 8. Edit the $LSF_ENVDIR/lsf.cluster.cluster_name file and change the LSF-HPC virtual host name to the new one in the HOSTS section. Edit $LSF_ENVDIR/hosts file to remove the old LSF-HPC virtual host name entry.
14. Managing Modulefiles This chapter describes how to load, unload, and examine modulefiles. Modulefiles provide a mechanism for accessing software commands and tools, particularly for third-party software. The HP XC System Software does not use modules for system-level manipulation. A modulefile contains the information that alters or sets shell environment variables, such as PATH and MANPATH. Some modulefiles are provided with the HP XC System Software and are available for you to load.
15. Mounting File Systems This chapter provides information and procedures for performing tasks to mount file systems that are internal and external to the HP XC system.
Example 15-1. Unedited fstab.proto File Unedited fstab.proto File # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead.
#% n[60-63] The file systems can be either of the following: • External to the node, but internal to the HP XC system. “Mounting Internal File Systems Throughout the HP XC System” (page 141) describes this situation. The use of csys is strongly recommended. For more information, see csys(5). • External to the HP XC system. “Mounting Remote File Systems” (page 144) describes this situation. NFS mounting is recommended for remote file system mounting.
Understanding the csys Utility in the Mounting Instructions The csys utility provides a facility for managing file systems on a clusterwide basis. It works in conjunction with the mount and umount commands by providing a pseudo file system type. The csys utility is documented in csys(5). The syntax of the fstab entry is as follows: fsname mountpoint fstype options fsname Specifies the device name of the file system to be mounted. mountpoint Specifies where to mount the file system on the mounting host.
The node that exports the file system to the other nodes in the HP XC system must have the disk_io role. 3. Determine whether you want to mount this file system over the administrative network or over the system interconnect. As a general rule, specify the administration network for administrative data and the system interconnect for application data. 4. Edit the /hptc_cluster/etc/fstab.proto file as follows: a. Locate the node designator that specifies the node or nodes that will mount the file system.
Example 15-2. The fstab.proto File Edited for Internal File System Mounting The fstab.proto File Edited for Internal File System Mounting # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead.
Figure 15-2. Remote File System Mounting Example Remote File System Mounting Example HP XC Cluster . . . n21 n22 External Server n23 xeno /extra n24 n25 . . . Understanding the Mounting Instructions The syntax of the fstab entry for remote mounting using NFS is as follows: exphost:expfs mountpoint fstype options exphost Specifies the external server that is exporting the file system. The exporting host can be expressed as an IP address or as a fully qualified domain name.
1. Determine which file system to export. In this example, the file system /extra is exported by the external server xeno. 2. Ensure that this file system can be NFS exported. Note This information is system dependent and is not covered in this document. Consult the documentation for the external server. 3. 4. Log in as superuser on the head node. Ensure that the mount point directory exists on all the nodes that will mount the remote file system.
Example 15-3. The fstab.proto File Edited for Remote File System Mounting The fstab.proto File Edited for Remote File System Mounting # This file contains additional file system information # for all # the nodes in the HP XC cluster. When a node # boots, this file will be parsed and from it a new # /etc/fstab will be created. # # Do not edit /etc/fstab! Edit this file # (/hptc_cluster/etc/fstab) instead.
16. Using Diagnostic Tools This chapter discusses the diagnostic tools that the HP XC system provides. It addresses the following topics: • Using the sys_check Utility (page 149) • Using the ovp Utility for System Verification (page 149) • Using the dgemm Utility to Analyze Performance (page 151) • Using the System Interconnect Diagnostic Tools (page 152) Troubleshooting procedures are described in Chapter 17.: Troubleshooting (page 159).
For a complete list of verification tests, see ovp(8). Note Run the ovp command only when you have exclusive use of the HP XC system and no jobs are running on the system. Normally you do not need to run this utility after you run it at the end of the HP XC system installation; however, it is recommended that you run this utility regularly to verify the health of the HP XC system.
See ovp(8) for more information about the ovp command and its options. Using the dgemm Utility to Analyze Performance You can use the dgemm utility, in conjunction with other diagnostic utilities, to help detect nodes that may not be performing at their peak performance. When a processor is not performing at its peak efficiency, the dgemm utility displays a WARNING message.
The max parameter is the maximum number of processors available to you in the lsf partition. No warning messages appear when all the specified nodes are performing at their peak efficiency. Using the System Interconnect Diagnostic Tools Various tools enable you to diagnose the system interconnect. Some tools are provided by the system interconnect manufacturer and are discussed in the Installation and Operation Guide (the hardware documentation) for your system.
The gm_drain_test Diagnostic Tool This diagnostic tool runs five tests for the Myricom® switches in an HP XC system. You must launch it from the head node and run it only during allocated preventive maintenance. The five tests are as follows: gm_prodmode_mon gm_allsize gm_debug gm_board_info gm_stress Examines environmental operating parameters. Tests network links. Tests PCI bandwidth. Tests host detection. Exercises the network; it might potentially detect workload problems.
. . date date date date date date date . . . time time time time time time time qsctrl: qsctrl: qsctrl: qsctrl: qsctrl: qsctrl: qsctrl: passed passed passed passed passed passed passed bus control check (ok) gateway check (bootp,0.0.0.0) module heartbeat check firmware version check (43-4081899) tftp server check (172.20.0.16) upgrade file check (503-upgrade.
the node will fail on all the other levels because it sends data through level1 to reach the higher levels. Ensure that level1 passes before testing level2. Note This diagnostic tool uses the adapter and the link 100 percent of the time during the test and, as a result, has great affect on machine performance. You must run the qsnet2_level_test utility as superuser from the head node.
# qsnet2_level_test level1 -d \ /hptc_cluster/adm/logs/diag/quadrics -r 0 -clean -v Example 2 The following example tests level3. Both nodes specified, n1 and n2, save their log files to the global directory /hptc_cluster/adm/logs/diag/quadrics in the directory named level3. Running the qsnet2_level_test diagnostic tool on only two nodes is useful because you can verify that a failing route has been repaired without affecting the use of the rest of the system.
Note You must launch this command from the head node. Run this command only during allocated preventive maintenance time frames because this diagnostic tool uses the adapter and the link 100 percent of the time during the test and, as a result, has a great affect on machine performance. The command format for qsnet2_drain_test utility is shown here: qsnet2_drain_test [-help] [-d logdirectory] The -help option displays the command line options. The -d option enables you to specify a log directory.
17. Troubleshooting This chapter provides information to help you troubleshoot problems with your HP XC system. It addresses the following topics: • System Interconnect Troubleshooting (page 159) • SLURM Troubleshooting (page 163) • LSF-HPC Troubleshooting (page 165) See also Chapter 16.: Using Diagnostic Tools (page 149) for information on available diagnostic tools, which can be also used to locate the source of the failure.
6. Run the lsmod command to display loaded modules. You should have one Myrinet GM loadable module installed. # lsmod | grep -i gm gm 589048 3 The size may differ from this output. 7. The Myrinet myri0 interface should be up. Use the ifconfig command to display the interface network configuration: # ifconfig myri0 myri0 Link encap:Ethernet HWaddr 00:60:DD:49:2D:DA inet addr:172.22.0.4 Bcast:172.22.0.255 Mask:255.255.255.
5. Run the lsmod command to display loaded modules. You should have eight Quadrics loadable modules installed. # lsmod jtag eip ep rms elan4 elan3 elan qsnet 30016 86856 821112 48800 466352 606676 80616 101040 0 1 9 0 1 0 0 0 (unused) [eip] [ep] [ep] [ep elan4 elan3] [eip ep rms elan4 elan3 elan] The sizes may differ from this output. 6.
num_phys_ports=2 port=1 port_state=PORT_ACTIVE sm_lid=0x0001 port_lid=0x0002 port_lmc=0x00 max_mtu=2048 port=2 port_state=PORT_DOWN sm_lid=0x0000 port_lid=0x0000 port_lmc=0x00 max_mtu=2048 2. Run the ib-setup command to verify that the configuration is correct. The output should be similar to that in the following example: # ib-setup ====== Voltaire HCA400 InfiniBand Stack Setup ====== Version: ibhost-v2.1.5_5_itapi: date on amt152. domain. System: kernel version: 2.4.21-15.10hp. XCsmp, memory 3595MB.
. sockets sdp ipoib-ud ats devucm q_mng ibat cm_2 gsi adaptor-tavor vverbs-base cm_2 gsi a daptor-tavor] mlog 67456 208008 171072 40000 17808 22184 67928 77312 64304 148496 54576 0 0 1 1 2 0 5 0 6 1 0 (unused) [sockets] [sdp] [sockets ipoib-ud devucm] [sdp devucm] [ipoib-ud ats ibat cm_2] [sdp ipoib-ud ats devucm q_mng ibat 9792 0 [sockets sdp ipoib-ud ats devucm q_mng ibat cm_2 gsi adaptor-tavor vverbs-base] 82208 0 [sockets sdp ipoib-ud ats q_mng ibat repository cm_2 gsi adaptor-tavor vverbs-base ml
slurm.conf The SLURM configuration file, /hptc_cluster/slurm/etc/slurm.conf.
# sinfo --all LSF-HPC Troubleshooting Take the following steps if you are have trouble submitting jobs or controlling LSF-HPC: • Ensure that the number of nodes in the lsf partition is less than or equal to the number of nodes reported in the XC.lic file. Sample entries follow: INCREMENT XC-CPUS Compaq auth.number exp. date nodes ... INCREMENT XC-PROCESSORS Compaq auth.number exp. date nodes ...
18. Servicing the HP XC System This section describes procedures for servicing the HP XC system. For more information, see the gervice guide for your cluster platform. This chapter addresses the following topics: • Adding a Node (page 167) • Replacing a Client Node (page 168) • Replacing a System Interconnect Board in an CP6000 System (page 169) Adding a Node The following procedure describes how to add one or more nodes in the HP XC system: 1. 2. Log in as superuser on the head node.
Enter the network type of your system. Valid choices are QMS32 or QMS64: [QMS64]: e. When prompted, enable Web access to the Nagios monitoring application and create a password for the nagiosadmin user. This password does not have to match any other password on your system. Running C50nagios Would you like to enable web based monitoring? ([y]/n) y Enter the password for the 'nagiosadmin' web user: New password: your_nagios_password Re-type new password: your_nagios_password f.
# scontrol update NodeName=n3 State=DRAINING Reason="Maintenance" b. Invoke the stopsys command to shut down the node gracefully: # stopsys n3 WARNING! Verify that the power is disconnected to prevent injury or death. 5. Remove the faulty node from the HP XC system. Note the ports for the Ethernet and system interconnect cables. 6. Add the replacement node to the HP XC system.
5. 6. 7. Remove the old system interconnect board. Install the new system interconnect board. Invoke the startsys command to turn on the node's power: # startsys --image_and_boot n3 “Using the System Interconnect Diagnostic Tools” (page 152) describes diagnostic tools for the Myrinet, InfiniBand, and Quadrics system interconnects.
A. Installing LSF-HPC for SLURM into an Existing Standard LSF Cluster This appendix describes how to join an HP XC system running LSF-HPC (integrated with the SLURM resource manager) to an existing Standard LSF Cluster without destroying existing LSF-HPC configuration. After installation, the HP XC system is treated as one host in the overall LSF cluster, that is, it becomes a cluster within the LSF cluster. Figure A-1.
Requirement LSF-HPC for HP XC can only be added to an existing standard LSF cluster running the most up-to-date version of LSF V6.0 or later. Newer versions of Standard LSF contain the schmod_slurm module necessary to interface properly with LSF-HPC for HP XC systems. Sample Case This section describes the sample case used throughout this appendix. There is an existing Standard LSF cluster installed on a single node with the host name plain. It will be added to the HP XC LSF node with the host name xclsf.
removing /hptc_cluster/lsf/work... removing /var/lsf... In this step, you remove the LSF installation from the current LSF_TOP directory, /opt/hptc/lsf/top. 4. Log out then log back in to clear the LSF environment settings. 5. Mount a new LSF_TOP tree from the non-HP XC system plain using NFS, to the HP XC system. In this sample case, the LSF_TOP location is /shared/lsf on the non-HP XC system. • On plain , the non-XC system, export the directory specified by LSF_TOP to the HP XC system.
-A RH-Firewall-1-INPUT -i External -p tcp -m tcp --dport 1023:65535 -j ACCEPT -A RH-Firewall-1-INPUT -i External -p udp -m udp --dport 1023:65535 -j ACCEPT This file establishes the initial firewall rules for all nodes on HP XC system. These new rules open all the unprivileged ports externally and 1 privileged port (1023). Opening the privileged port allows LSF commands run as root on HP XC to communicate with non-XC LSF daemons, since LSF commands executed by root use privileged ports.
. /shared/lsf/conf/profile.lsf.xc fi esac # cat lsf.csh if ( "${path}" !~ *-slurm/etc* ) then if ( -f /shared/lsf/conf/cshrc.lsf.xc ) then source /shared/lsf/conf/cshrc.lsf.xc endif endif NOTE: The profile.lsf.xc and cshrc.lsf.xc scripts do not exist yet. We will establish them after installing LSF-HPC in a future step. Be sure to replace /shared/lsf with the location of LSF_TOP on your system. Note that the names of the setup files are appended with .xc.
1. Preserve the existing environment setup files. a. Change directory to the existing LSF_TOP/conf directory. b. Rename the setup files by appending a unique identifier. For the sample case: # cd /shared/lsf/conf # mv profile.lsf profile.lsf.orig # mv cshrc.lsf cshrc.lsf.orig On installation, LSF-HPC provides its own profile.lsf and cshrc.lsf files, rename those files to a unique name and restore these files.
Pre-installation check report saved as text file: /shared/lsf/hpctmp/hpc6.1_hpcinstall/prechk.rpt. ... Done LSF pre-installation check. Installing hpc binary files " hpc6.1_linux2.4-glibc2.3-amd64-slurm"... Copying hpcinstall files to /shared/lsf/6.1/install ... Done copying hpcinstall files to /shared/lsf/6.1/install Installing linux2.4-glibc2.3-amd64-slurm ... Please wait, extracting hpc6.1_linux2.4-glibc2.3-amd64-slurm may take up to a few minutes ... ... Done extracting /shared/lsf/hpctmp/hpc6.
"/shared/lsf/hpctmp/hpc6.1_hpcinstall/hpc_getting_started.html". After setting up your LSF server hosts and verifying your cluster "corplsf" is running correctly, see "/shared/lsf/6.1/hpc_quick_admin.html" to learn more about your new LSF cluster. Perform Post Installation Tasks The LSF documentation and instructions mentioned at the end of the hpc_install script are generic and have not been tuned with the HP XC system.
LSF_RSH=ssh 5. 6. e. Save the file and exit the text editor. Optional: Configure any special HP XC-specific queues. XC V2.1 recommends the use of a JOB_STARTER script to be configured for all queues on a HP XC system. The default installation of LSF on the HP XC system provides default queue configurations which can be found here: /opt/hptc/lsf/etc/configdir/lsb.queues.
When the LSF daemons have started up and synchronized their data with the rest of the LSF cluster, the lshosts and bhosts commands should display all the nodes with their appropriate values and indicate that they are ready for use: $ lshosts HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES plain LINUX86 PC1133 23.1 2 248M 1026M Yes () xclsf SLINUX6 Intel_EM 60.
Example A-5. As a User on the HP XC Node Launching to an HP XC Resource As a User on the HP XC Node Launching to an HP XC Resource $ bsub -I -n6 -R type=SLINUX64 srun hostname Job <416> is submitted to default queue . <> <> xc3 xc3 xc2 xc2 xc1 xc1 Troubleshooting • Use the following commands to check your configuration changes: • iptables -L and other options to confirm the firewall settings • pdsh -a 'ls -l /etc/init.
B. Installing Standard LSF on a Subset of Nodes This document provides instructions for installing Standard LSF on a subset of nodes in the HP XC system; another subset of nodes runs LSF-HPC. This situation is useful for an HP XC system that is comprised of two different types of nodes, for example, a set of large SMP nodes (“fat” nodes) running LSF-HPC and a set of “thin” nodes running Standard LSF, as Figure B-1. shows. Figure B-1.
Sample Case Consider an HP XC system of 128 nodes consisting of: • A head node with a host name of xc128 • 6 large SMP nodes (or fat nodes) with the host names xc[1-6] • 122 thin nodes. 114 of the thin nodes are compute nodes and have host names of xc7-120 Instructions 1. Log into the head node of the HP XC system as superuser (root). Do not log in though the cluster alias. 2. Change directory to /opt/hptc/lsf/top/conf. 3. Rename the existing setup files: # mv profile.lsf profile.lsf.xc # mv cshrc.
< # Currently we only check for HP-hptc: /var/slurm/lsfslurm --> # Currently we only check for HP-hptc: /etc/hptc-release 127c127 < _slurm_signature_file="/var/slurm/lsfslurm" --> _slurm_signature_file="/etc/hptc-release" # diff cshrc.tmp cshrc.lsf.notxc 266c266 < if ( -f /var/slurm/lsfslurm ) then --> if ( -f /etc/hptc-release ) then • Replace the old file with the new file: # mv -f profile.tmp profile.lsf.notxc # mv -f cshrc.tmp cshrc.lsf.notxc 6.
# chkconfig --add slsf # chkconfig --list slsf slsf0:off 1:off 2:off f. slsf 8. 3:on 4:on 5:on 6:off Edit the /opt/hptc/systemimager/etc/chkconfig.map file to add the following line to enable this new "service" on all nodes in the HP XC system: 0:off 1:off 2:off 3:on 4:on 5:on 6:off Update the node roles and re-image: a. Use the stopsys command to shut down the other nodes of the HP XC system. b. Change directory to /opt/hptc/config/sbin c. Execute cluster_config utility. d. • Select Modify Nodes.
<> <> xc120 $ bsub -I -n2 -R type=SLINUX64 srun hostname Job <178> is submitted to default queue . <> <
Glossary A administration branch The half (branch) of the administration network that contains all of the general-purpose administration ports to the nodes of the HP XC system. administration network The private network within the HP XC system that is used for administrative operations. B base image The collection of files and directories that represents the common files and configuration data that are applied to all nodes in an HP XC system. branch switch A component of the Administration Network.
FCFS First-come, first-served. An LSF job-scheduling policy that specifies that jobs are dispatched according to their order in a queue, which is determined by job priority, not by order of submission to the queue. first-come, first-served See FCFS. G global storage Storage within the HP XC system that is available to all of the nodes in the system. Also known as local storage. golden client The node from which a standard file system image is created.
L Linux Virtual Server See LVS. load file A file containing the names of multiple executables that are to be launched simultaneously by a single command. Load Sharing Facility See LSF-HPC with SLURM. local storage Storage that is available or accessible from one node in the HP XC system. LSF execution host The node on which LSF runs. A user's job is submitted to the LSF execution host. Jobs are launched from the LSF execution host and are executed on one or more compute nodes.
Network Information Services See NIS. NIS Network Information Services. A mechanism that enables centralization of common data that is pertinent across multiple machines in a network. The data is collected in a domain, within which it is accessible and relevant. The most common use of NIS is to maintain user account information across a set of networked hosts. NIS client Any system that queries NIS servers for NIS database information.
SMP Symmetric multiprocessing. A system with two or more CPUs that share equal (symmetric) access to all of the facilities of a computer system, such as the memory and I/O subsystems. In an HP XC system, the use of SMP technology increases the number of CPUs (amount of computational power) available per unit of space. ssh Secure Shell. A shell program for logging in to and executing commands on a remote computer.
Index A adding a local user account, 97 adding a node, 167 adding a service, 47 archive.cron script, 60 B bacct command, 127 base_image, 84 updating, 84 base_exclude_file, 85 bhist command, 126 /bin directory, 31 bjobs command, 126 bkill command, 126 bresume command, 126 bstop command, 126 C cexec command, 26 changing the root password, 99 check_lsf plug-in, 129 chkconfig.
HP XC, 31 local, 31 NFS, 139 symbolic links, 32 XC specific, 32 firewall, 91–92, 165 fstab.
restore, 59–60 management hub services, 24 managing licenses, 55 modifying a local user account, 98 modulefile loading, 34, 137 managing, 34, 137 unloading, 35, 137 viewing available, 34, 137 viewing loaded, 34, 137 monitoring hierarchy, 61 strategy, 61 monitoring SLURM, 113 monitoring tools, 61 mounting file systems, 139 MUNGE authentication package, 164 Myrinet system interconnect diagnostic tools, 152 troubleshooting, 159 MySQL, 24, 29, 57 accessing, 57 N Nagios, 45, 62, 67, 113 LSF monitoring, 129 LSF-
central control daemon service, 24 compute service, 24 configuration and management database, 24 configuration files, 33 DHCP, 25 display all, 45 display node that provides a service, 46, 58 display services for node, 46 global maintenance, 88 global system, 53 I/O, 24 login, 24 LSF, 24 LVS director, 24 management hub services, 24 NAT, 24 NTP, 78 power daemon, 25 restarting, 47 stopping, 47 SystemImager, 24 service command, 43 service group, 28 setnode command, 26, 41 SFS Nagios host, 67 sftp command, 25 sh