Providing Open Architecture High Availability Solutions

Providing Open Architecture High Availability Solutions
100
11.0 Layer-Specific Capabilities – Applications
When reviewing capabilities for high availability systems from an application perspective it is
important to understand the application objectives. Many different applications are required to
make a system. There are many ways to classify applications and the capabilities of applications to
participate, control, or operate within a highly available system. Since the goal of most applications
is not specifically focussed on fault management, the system should augment that task by providing
a management interface to allow an application to monitor and control operations. Additionally, an
application should be able to supply its health information to a management tracking function.
To that end, the following groups can be defined for application capabilities:
Status
Notification
Failover
Recovery
Resilience
11.1 Status
An application needs to know enough about the system configuration to determine if it is capable
of running. For example, it must know if the hardware, storage, and/or communications path is in
existence and operating. This requires that the application be able to extract the knowledge of the
system through a process called discovery. This information can be very platform and software
specific, but general concepts can still be applied. Simple management interfaces already exist and
can be leveraged to help with discovery. Internal standards such as CIM and IPMI can be used, as
well as network-based management interfaces such as SNMP, to provide some or all of the
information. A practical view of recovering status would be to try to have the widest breadth of
information and a limited depth. This could be provided in a common way on all platforms and
operating environments. If the application needs additional hardware specific information, then it
should be able to get that using a platform/vendor/operating system specific method.
An additional area of concentration should be software configuration. This is the ability to survey
or inventory the software configuration of the system. The information that is needed includes
service, library, file, and version information.
Finally, the interdependency of processes, services and daemons that can be used by some
applications needs to be determined. Items like pipelines, shared memories, and protocol stacks can
all be shared between applications, hence complicating the dependency tree.
11.2 Notification
Applications need access to timely notification of desired events. These events could be from
hardware or other components in the system or events in the form of a heartbeat from a mated
application running elsewhere. Again, a common method of hooking in for an asynchronous
operation that covers a reasonably wide level of functions should be through a common interface.
Then, additional detailed information can be extracted in specific ways. The event notification
should allow for some level of filtering. Common methods would be to use a publish/subscribe to
register the application for others to see and to be managed.