Managing Serviceguard Eighteenth Edition, September 2010

ManualsBrandsHP ManualsSoftwareHP Serviceguard Quorum Software

431

432

433

434

435

436

437

438

439

440

Minimizing Planned Downtime

Planned downtime (as opposed to unplanned downtime) is scheduled; examples

include backups, systems upgrades to new operating system revisions, or hardware

replacements. For planned downtime, application designers should consider:

• Reducing the time needed for application upgrades/patches.

Can an administrator install a new version of the application without scheduling

downtime? Can different revisions of an application operate within a system? Can

different revisions of a client and server operate within a system?

• Providing for online application reconfiguration.

Can the configuration information used by the application be changed without

bringing down the application?

• Documenting maintenance operations.

Does an operator know how to handle maintenance operations?

When discussing highly available systems, unplanned failures are often the main point

of discussion. However, if it takes 2 weeks to upgrade a system to a new revision of

software, there are bound to be a large number of complaints.

The following sections discuss ways of handling the different types of planned

downtime.

Reducing Time Needed for Application Upgrades and Patches

Once a year or so, a new revision of an application is released. How long does it take

for the end-user to upgrade to this new revision? This answer is the amount of planned

downtime a user must take to upgrade their application. The following guidelines

reduce this time.

Provide for Rolling Upgrades

Provide for a “rolling upgrade” in a client/server environment. For a system with many

components, the typical scenario is to bring down the entire system, upgrade every

node to the new version of the software, and then restart the application on all the

affected nodes. For large systems, this could result in a long downtime.

An alternative is to provide for a rolling upgrade. A rolling upgrade rolls out the new

software in a phased approach by upgrading only one component at a time. For example,

the database server is upgraded on Monday, causing a 15 minute downtime. Then on

Tuesday, the application server on two of the nodes is upgraded, which leaves the

application servers on the remaining nodes online and causes no downtime. On

Wednesday, two more application servers are upgraded, and so on. With this approach,

you avoid the problem where everything changes at once, plus you minimize long

outages.

The trade-off is that the application software must operate with different revisions of

the software. In the above example, the database server might be at revision 5.0 while

440 Designing Highly Available Cluster Applications