HP XC System Software Administration Guide Version 3.2.1
21.4.4 OFED Troubleshooting Procedures.....................................................................................261
21.5 Improved Availability Issues.......................................................................................................264
21.5.1 How To Start HP Serviceguard When Only the Head Node is Running...........................264
21.5.2 Restart Serviceguard Quorum Server if Quorum Server Node is Re-imaged....................264
21.5.3 Known Limitation if Nagios is Configured for Improved Availability..............................264
21.5.4 Network Restart Command Negatively Affects Serviceguard...........................................265
21.5.5 Problem Failing Over Database Package Under Serviceguard...........................................265
21.6 SLURM Troubleshooting.............................................................................................................265
21.6.1 SLURM Configuration Issues..............................................................................................265
21.6.2 SLURM Run-Time Troubleshooting....................................................................................266
21.7 LSF-HPC Troubleshooting...........................................................................................................267
22 Servicing the HP XC System...................................................................................271
22.1 Adding a Node............................................................................................................................271
22.2 Replacing a Client Node..............................................................................................................273
22.3 Actualizing Planned Nodes.........................................................................................................274
22.4 Replacing a Server Blade Enclosure OnBoard Administrator.....................................................276
22.5 Replacing a System Interconnect Board in an HP CP6000 System.............................................277
22.6 Software RAID Disk Replacement...............................................................................................278
22.6.1 Replacing a RAID Disk........................................................................................................278
22.6.2 Writing a Boot Block to the RAID Disk...............................................................................280
22.7 Incorporating External Network Interface Cards........................................................................281
22.7.1 Gathering Information.........................................................................................................282
22.7.1.1 Gathering Node-Specific Information.........................................................................282
22.7.1.2 Determining NIC-Specific Information.......................................................................283
22.7.1.3 Gathering Networking Information............................................................................285
22.7.1.4 Consolidating Information in the NIC Data Worksheet.............................................285
22.7.2 Editing the platform_vars.ini File........................................................................................285
22.7.3 Using the device_config Command....................................................................................289
22.7.4 Updating the Database for the External Network Card......................................................289
22.7.5 Updating the Firewall Custom Configuration....................................................................290
22.7.5.1 Verifying the Updated CMDB.....................................................................................292
22.7.6 Reconfiguring the Nodes.....................................................................................................293
22.7.7 Verifying Success.................................................................................................................293
22.7.7.1 Verifying the Ethernet Port..........................................................................................294
22.7.7.2 Verifying the Ethernet Device.....................................................................................294
22.7.7.3 Testing the Network Connection.................................................................................294
22.7.8 Updating the Golden Image................................................................................................294
A Installing LSF-HPC with SLURM into an Existing Standard LSF Cluster ...............295
A.1 Assumptions.................................................................................................................................295
A.2 Requirement.................................................................................................................................296
A.3 Sample Case..................................................................................................................................296
A.4 HP XC Preparation.......................................................................................................................296
A.5 Installing LSF-HPC with SLURM.................................................................................................301
A.6 Perform Post Installation Tasks....................................................................................................304
A.7 Configuring the LSF Alias............................................................................................................305
A.8 Starting LSF on the HP XC System...............................................................................................306
A.9 Sample Running Jobs....................................................................................................................307
A.10 Troubleshooting..........................................................................................................................307
B Installing Standard LSF on a Subset of Nodes.......................................................309
B.1 Requirements................................................................................................................................309
10 Table of Contents