Managing HP Serviceguard A.11.20.00 for Linux, June 2012
Renaming or Replacing an External Script Used by a Running Package...................................218
Reconfiguring a Package on a Halted Cluster .....................................................................218
Adding a Package to a Running Cluster..............................................................................218
Deleting a Package from a Running Cluster ........................................................................219
Resetting the Service Restart Counter..................................................................................219
Allowable Package States During Reconfiguration ...............................................................219
Changes that Will Trigger Warnings..............................................................................223
Responding to Cluster Events .................................................................................................223
Single-Node Operation ........................................................................................................224
Removing Serviceguard from a System.....................................................................................224
8 Troubleshooting Your Cluster....................................................................225
Testing Cluster Operation ......................................................................................................225
Testing the Package Manager ...........................................................................................225
Testing the Cluster Manager .............................................................................................226
Monitoring Hardware ...........................................................................................................226
Replacing Disks....................................................................................................................227
Replacing a Faulty Mechanism in a Disk Array....................................................................227
Replacing a Lock LUN......................................................................................................227
Revoking Persistent Reservations after a Catastrophic Failure.......................................................227
Examples........................................................................................................................228
Replacing LAN Cards...........................................................................................................228
Replacing a Failed Quorum Server System...............................................................................229
Troubleshooting Approaches .................................................................................................230
Reviewing Package IP Addresses .......................................................................................230
Reviewing the System Log File ...........................................................................................231
Sample System Log Entries ...........................................................................................231
Reviewing Configuration Files ...........................................................................................232
Reviewing the Package Control Script ................................................................................232
Using the cmquerycl and cmcheckconf Commands...............................................................232
Reviewing the LAN Configuration ......................................................................................232
Solving Problems .................................................................................................................233
Name Resolution Problems................................................................................................233
Networking and Security Configuration Errors.................................................................233
Halting a Detached Package.............................................................................................233
Cluster Re-formations Caused by Temporary Conditions........................................................233
Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low.................................234
System Administration Errors .............................................................................................234
Package Control Script Hangs or Failures ......................................................................235
Package Movement Errors (Legacy Packages).......................................................................236
Node and Network Failures .............................................................................................237
Troubleshooting the Quorum Server....................................................................................237
Authorization File Problems...........................................................................................237
Timeout Problems........................................................................................................237
Messages...................................................................................................................238
Lock LUN Messages.........................................................................................................238
Troubleshooting Serviceguard Manager...................................................................................238
A Designing Highly Available Cluster Applications .......................................239
Automating Application Operation ........................................................................................239
Insulate Users from Outages .............................................................................................239
Define Application Startup and Shutdown ..........................................................................240
Controlling the Speed of Application Failover ..........................................................................240
Replicate Non-Data File Systems .......................................................................................240
Evaluate the Use of a Journaled Filesystem (JFS)...................................................................241
Minimize Data Loss .........................................................................................................241
Contents 11