Providing Open Architecture High Availability Solutions
Providing Open Architecture High Availability Solutions
90
Processes can also be forced to exit asynchronously by sending a process kill signal, and can aid
the fault diagnosis process. The ability to recover resources from a faulted, or shed, process aids in
fault recovery. The open-standard Process ID (PID) structures allow the middleware to easily
determine what applications are currently loaded and running.
Dynamic Loading
Dynamic processes can be loaded, executed and unloaded without restarting the system or OS.
This allows the system to be dynamically reconfigured or upgraded without having to reboot.
Moreover, since processes are completely contained within virtual address spaces, restarting a
single process after a failure is a safe option and obviates the need to restart the entire system. Such
dynamic restart capabilities at the application and driver levels provide a foundation for high
availability and allow hot upgrades of executing software as well as device drivers and their related
hardware. The ability to dynamically create new processes and load programs may also be used for
application rollback, under the directed control of the middleware. This capability is also useful in
clearing transient software faults.
9.3.3 I/O Device Drivers
The OS layer provides the access to the input/output system. I/O requests are passed to device
drivers that service that type of request. The device drivers handle the hardware specific aspects of
I/O so that applications can be hardware independent. Because all I/O access goes through the
kernel, I/O requests can be redirected should the system be dynamically reconfigured. This feature
can be used to mask I/O failures at the application level as long as there it is a virtual device, or to
implement some other form of hardware redundancy to take over the I/O operation.
Dynamic Device Drivers
An OS capability that allows devices drivers to be dynamically loaded supports the in-place
provisioning of new hardware without re-starting the OS.
9.3.4 Signal IPC Mechanism
An OS that supports signals can use this capability for asynchronous notification and control of
threads. Signals can be used to start, stop, or force the exit of threads or process.
In the case of CPU fault, a signal sent to the thread running the code that caused the fault provides
a dependable method for logging and reporting the fault.
The signal timers can be used on a thread basis to provide a software watchdog function. The
timers can also be used to generate a periodic signal to remind the process to use the middleware
checkpoint capabilities.
9.3.5 Management Access to Kernel Information
Structures and Process States
Accepted industry standard information blocks (such as the Process ID table, memory in use, disk
free blocks, CPU idle time, etc.), provide a mechanism for external access and visibility into the
OS layer. Access to system information through standard calling mechanisms provides a well-
understood programmatic access to OS layer information. Both the application and middleware
layers frequently use this access interface to the OS.