Providing Open Architecture High Availability Solutions

100

11.0 Layer-Specific Capabilities – Applications

When reviewing capabilities for high availability systems from an application perspective it is

important to understand the application objectives. Many different applications are required to

make a system. There are many ways to classify applications and the capabilities of applications to

participate, control, or operate within a highly available system. Since the goal of most applications

is not specifically focussed on fault management, the system should augment that task by providing

a management interface to allow an application to monitor and control operations. Additionally, an

application should be able to supply its health information to a management tracking function.

To that end, the following groups can be defined for application capabilities:

• Status

• Notification

• Failover

• Recovery

• Resilience

11.1 Status

An application needs to know enough about the system configuration to determine if it is capable

of running. For example, it must know if the hardware, storage, and/or communications path is in

existence and operating. This requires that the application be able to extract the knowledge of the

system through a process called discovery. This information can be very platform and software

specific, but general concepts can still be applied. Simple management interfaces already exist and

can be leveraged to help with discovery. Internal standards such as CIM and IPMI can be used, as

well as network-based management interfaces such as SNMP, to provide some or all of the

information. A practical view of recovering status would be to try to have the widest breadth of

information and a limited depth. This could be provided in a common way on all platforms and

operating environments. If the application needs additional hardware specific information, then it

should be able to get that using a platform/vendor/operating system specific method.

An additional area of concentration should be software configuration. This is the ability to survey

or inventory the software configuration of the system. The information that is needed includes

service, library, file, and version information.

Finally, the interdependency of processes, services and daemons that can be used by some

applications needs to be determined. Items like pipelines, shared memories, and protocol stacks can

all be shared between applications, hence complicating the dependency tree.

11.2 Notification

Applications need access to timely notification of desired events. These events could be from

hardware or other components in the system or events in the form of a heartbeat from a mated

application running elsewhere. Again, a common method of hooking in for an asynchronous

operation that covers a reasonably wide level of functions should be through a common interface.

Then, additional detailed information can be extracted in specific ways. The event notification

should allow for some level of filtering. Common methods would be to use a publish/subscribe to