Managing Serviceguard Eighteenth Edition, September 2010
Networking and Security Configuration Errors.........................................................414
Cluster Re-formations Caused by Temporary Conditions..........................................414
Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low.............415
System Administration Errors ....................................................................................416
Package Control Script Hangs or Failures ............................................................416
Problems with Cluster File System (CFS)....................................................................418
Problems with VxVM Disk Groups.............................................................................419
Force Import and Deport After Node Failure........................................................419
Package Movement Errors ..........................................................................................420
Node and Network Failures .......................................................................................420
Troubleshooting the Quorum Server...........................................................................421
Authorization File Problems..................................................................................421
Timeout Problems..................................................................................................421
Messages................................................................................................................421
A Enterprise Cluster Master Toolkit .............................................................................................423
B Designing Highly Available Cluster Applications .......................................................................425
Automating Application Operation ................................................................................425
Insulate Users from Outages ......................................................................................426
Define Application Startup and Shutdown ................................................................426
Controlling the Speed of Application Failover ................................................................427
Replicate Non-Data File Systems ...............................................................................427
Use Raw Volumes .......................................................................................................427
Evaluate the Use of JFS ...............................................................................................427
Minimize Data Loss ....................................................................................................427
Minimize the Use and Amount of Memory-Based Data ......................................428
Keep Logs Small ....................................................................................................428
Eliminate Need for Local Data ..............................................................................428
Use Restartable Transactions ......................................................................................428
Use Checkpoints .........................................................................................................429
Balance Checkpoint Frequency with Performance ...............................................429
Design for Multiple Servers ........................................................................................429
Design for Replicated Data Sites ................................................................................430
Designing Applications to Run on Multiple Systems .....................................................430
Avoid Node-Specific Information ..............................................................................431
Obtain Enough IP Addresses ................................................................................431
Allow Multiple Instances on Same System ...........................................................431
Avoid Using SPU IDs or MAC Addresses .................................................................432
Assign Unique Names to Applications ......................................................................432
Use DNS ................................................................................................................432
Use uname(2) With Care ............................................................................................433
Bind to a Fixed Port ....................................................................................................433
Table of Contents 17