Managing Serviceguard A.11.20, March 2013

8 Troubleshooting Your Cluster....................................................................327
Testing Cluster Operation ......................................................................................................327
Start the Cluster using Serviceguard Manager.....................................................................327
Testing the Package Manager ...........................................................................................327
Testing the Cluster Manager .............................................................................................328
Testing the Network Manager ..........................................................................................328
Monitoring Hardware ...........................................................................................................328
Using System Fault Management Service.............................................................................329
Using Event Monitoring Service..........................................................................................329
Using EMS (Event Monitoring Service) Hardware Monitors....................................................329
Hardware Monitors and Persistence Requests.......................................................................329
Using HP ISEE (HP Instant Support Enterprise Edition)............................................................330
Replacing Disks....................................................................................................................330
Replacing a Faulty Array Mechanism..................................................................................330
Replacing a Faulty Mechanism in an HA Enclosure..............................................................330
Replacing a Lock Disk.......................................................................................................331
Replacing a Lock LUN......................................................................................................331
Online Hardware Maintenance with In-line SCSI Terminator .................................................332
Replacing I/O Cards............................................................................................................332
Replacing SCSI Host Bus Adapters.....................................................................................332
Revoking Persistent Reservations after a Failure.........................................................................333
Examples........................................................................................................................333
Replacing LAN or Fibre Channel Cards...................................................................................333
Offline Replacement.........................................................................................................334
Online Replacement.........................................................................................................334
After Replacing the Card..................................................................................................334
Replacing a Failed Quorum Server System...............................................................................334
Troubleshooting Approaches .................................................................................................335
Reviewing Package IP Addresses .......................................................................................336
Reviewing the System Log File ...........................................................................................336
Sample System Log Entries ...........................................................................................336
Reviewing Object Manager Log Files .................................................................................337
Reviewing Serviceguard Manager Log Files ........................................................................337
Reviewing the System Multi-node Package Files....................................................................337
Reviewing Configuration Files ...........................................................................................337
Reviewing the Package Control Script ................................................................................337
Using the cmcheckconf Command......................................................................................338
Reviewing the LAN Configuration ......................................................................................338
Solving Problems .................................................................................................................338
Serviceguard Command Hangs.........................................................................................339
Networking and Security Configuration Errors.....................................................................339
Cluster Re-formations Caused by Temporary Conditions........................................................339
Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low.................................340
System Administration Errors .............................................................................................340
Package Control Script Hangs or Failures ......................................................................341
Problems with Cluster File System (CFS)...............................................................................343
Problems with VxVM Disk Groups......................................................................................343
Force Import and Deport After Node Failure...................................................................343
Package Movement Errors ................................................................................................344
Node and Network Failures .............................................................................................344
Troubleshooting the Quorum Server....................................................................................344
Authorization File Problems...........................................................................................344
Timeout Problems........................................................................................................345
Messages...................................................................................................................345
14 Contents