HP XC System Software Administration Guide Version 4.0

Table Of Contents
21.2.5 Using the nrg Command's Analyze Mode..........................................................................250
21.3 Messages Reported by Nagios.....................................................................................................251
21.4 System Interconnect Troubleshooting.........................................................................................254
21.4.1 Myrinet System Interconnect Troubleshooting...................................................................254
21.4.2 Quadrics System Interconnect Troubleshooting.................................................................255
21.4.3 OFED Troubleshooting Procedures.....................................................................................256
21.5 Improved Availability Issues.......................................................................................................260
21.5.1 How To Start HP Serviceguard When Only the Head Node is Running...........................260
21.5.2 Restart Serviceguard Quorum Server if Quorum Server Node is Re-imaged....................260
21.5.3 Known Limitation if Nagios is Configured for Improved Availability..............................261
21.5.4 Network Restart Command Negatively Affects Serviceguard...........................................261
21.5.5 Problem Failing Over Database Package Under Serviceguard...........................................261
21.6 SLURM Troubleshooting.............................................................................................................262
21.6.1 SLURM Configuration Issues..............................................................................................262
21.6.2 SLURM Run-Time Troubleshooting....................................................................................263
21.7 LSF Troubleshooting....................................................................................................................263
22 Servicing the HP XC System...................................................................................267
22.1 Adding a Node............................................................................................................................267
22.2 Replacing a Client Node..............................................................................................................269
22.3 Actualizing Planned Nodes.........................................................................................................270
22.4 Replacing a Server Blade Enclosure OnBoard Administrator.....................................................272
22.5 Replacing a System Interconnect Board in an HP CP6000 System.............................................273
22.6 Software RAID Disk Replacement...............................................................................................274
22.6.1 Replacing a RAID Disk........................................................................................................274
22.6.2 Writing a Boot Block to the RAID Disk...............................................................................276
22.7 Incorporating External Network Interface Cards........................................................................277
22.7.1 Gathering Information.........................................................................................................278
22.7.1.1 Gathering Node-Specific Information.........................................................................278
22.7.1.2 Determining NIC-Specific Information.......................................................................279
22.7.1.3 Gathering Networking Information............................................................................281
22.7.1.4 Consolidating Information in the NIC Data Worksheet.............................................281
22.7.2 Editing the platform_vars.ini File........................................................................................281
22.7.3 Using the device_config Command....................................................................................285
22.7.4 Updating the Database for the External Network Card......................................................285
22.7.5 Updating the Firewall Custom Configuration....................................................................286
22.7.5.1 Verifying the Updated CMDB.....................................................................................288
22.7.6 Reconfiguring the Nodes.....................................................................................................289
22.7.7 Verifying Success.................................................................................................................289
22.7.7.1 Verifying the Ethernet Port..........................................................................................290
22.7.7.2 Verifying the Ethernet Device.....................................................................................290
22.7.7.3 Testing the Network Connection.................................................................................290
22.7.8 Updating the Golden Image................................................................................................290
A Installing LSF with SLURM into an Existing Standard LSF Cluster .......................291
A.1 Assumptions.................................................................................................................................291
A.2 Requirement.................................................................................................................................292
A.3 Sample Case..................................................................................................................................292
A.4 HP XC Preparation.......................................................................................................................292
A.5 Installing LSF with SLURM..........................................................................................................297
A.6 Perform Post Installation Tasks....................................................................................................300
A.7 Configuring the LSF Alias............................................................................................................301
A.8 Starting LSF on the HP XC System...............................................................................................302
10 Table of Contents