Managing Serviceguard Seventeenth Edition, First Reprint December 2009
8 Troubleshooting Your Cluster....................................................................................................365
Testing Cluster Operation ................................................................................................365
Start the Cluster using Serviceguard Manager...........................................................365
Testing the Package Manager .....................................................................................365
Testing the Cluster Manager .......................................................................................366
Testing the Network Manager ....................................................................................366
Monitoring Hardware ......................................................................................................367
Using Event Monitoring Service.................................................................................367
Using EMS (Event Monitoring Service) Hardware Monitors.....................................368
Hardware Monitors and Persistence Requests............................................................368
Using HP ISEE (HP Instant Support Enterprise Edition)...........................................368
Replacing Disks.................................................................................................................368
Replacing a Faulty Array Mechanism.........................................................................368
Replacing a Faulty Mechanism in an HA Enclosure...................................................369
Replacing a Lock Disk.................................................................................................370
Replacing a Lock LUN.................................................................................................370
Online Hardware Maintenance with In-line SCSI Terminator ...................................372
Replacing I/O Cards..........................................................................................................372
Replacing SCSI Host Bus Adapters.............................................................................372
Replacing LAN or Fibre Channel Cards...........................................................................372
Offline Replacement....................................................................................................373
Online Replacement....................................................................................................373
After Replacing the Card.............................................................................................373
Replacing a Failed Quorum Server System......................................................................374
Troubleshooting Approaches ...........................................................................................375
Reviewing Package IP Addresses ...............................................................................375
Reviewing the System Log File ..................................................................................376
Sample System Log Entries ...................................................................................376
Reviewing Object Manager Log Files .........................................................................377
Reviewing Serviceguard Manager Log Files ..............................................................377
Reviewing the System Multi-node Package Files........................................................377
Reviewing Configuration Files ...................................................................................377
Reviewing the Package Control Script .......................................................................377
Using the cmcheckconf Command..........................................................................378
Using the cmviewconf Command.............................................................................378
Reviewing the LAN Configuration ............................................................................378
Solving Problems .............................................................................................................379
Serviceguard Command Hangs..................................................................................379
Networking and Security Configuration Errors.........................................................379
Cluster Re-formations Caused by Temporary Conditions..........................................380
Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low.............380
System Administration Errors ....................................................................................381
Package Control Script Hangs or Failures ............................................................382
Problems with Cluster File System (CFS)....................................................................384
Table of Contents 15