Managing Serviceguard 11th Edition, Version A.11.16, Second Printing June 2004

Designing Highly Available Cluster Applications
Minimizing Planned Downtime
Appendix C 385
Provide for Rolling Upgrades
Provide for a “rolling upgrade” in a client/server environment. For a
system with many components, the typical scenario is to bring down the
entire system, upgrade every node to the new version of the software,
and then restart the application on all the affected nodes. For large
systems, this could result in a long downtime.
An alternative is to provide for a rolling upgrade. A rolling upgrade rolls
out the new software in a phased approach by upgrading only one
component at a time. For example, the database server is upgraded on
Monday, causing a 15 minute downtime. Then on Tuesday, the
application server on two of the nodes is upgraded, which leaves the
application servers on the remaining nodes online and causes no
downtime. On Wednesday, two more application servers are upgraded,
and so on. With this approach, you avoid the problem where everything
changes at once, plus you minimize long outages.
For information about the supported Serviceguard releases for rolling
upgrade, see the Serviceguard Release Notes for your version, at
http://docs.hp.com/hpux/ha.
The trade-off is that the application software must operate with different
revisions of the software. In the above example, the database server
might be at revision 5.0 while the some of the application servers are at
revision 4.0. The application must be designed to handle this type of
situation.
Do Not Change the Data Layout Between Releases
Migration of the data to a new format can be very time intensive. It also
almost guarantees that rolling upgrade will not be possible. For example,
if a database is running on the first node, ideally, the second node could
be upgraded to the new revision of the database. When that upgrade is
completed, a brief downtime could be scheduled to move the database
server from the first node to the newly upgraded second node. The
database server would then be restarted, while the first node is idle and
ready to be upgraded itself. However, if the new database revision
requires a different database layout, the old data will not be readable by
the newly updated database. The downtime will be longer as the data is
migrated to the new layout.