Providing Open Architecture High Availability Solutions

Providing Open Architecture High Availability Solutions
81
Typically, these will involve monitoring analog values that can reflect on the health of the hardware
even when a fault has not occurred. For example, a fan may be slowing down, but still functioning
within specifications. This may be indicating a bearing wearing out, and with this warning, the fan
can be replaced before a fault occurs. In another example, monitoring a temperature may indicate
an impending problem before any component has actually failed, triggering a response to bring the
temperature back into a safer range before a fault occurs.
8.4 Open Architecture Solution for Hardware Capabilities
As has been described above, there are a large number of hardware monitoring and control
capabilities required in a fault managed high availability system, but the specific capabilities which
may be required will vary significantly from one system architecture to another.
To support this variety of requirements in an open-standard way, the platform management system
itself needs to have certain key capabilities. Among these are:
An industry recognized interface to the operating system and management middleware
The ability to be self-defining. The platform management system should be able to identify
what capabilities it has so that operating systems and management software can adapt to the
specifics of a particular platform.
Use of an industry recognized management bus for intra-system communication. This allows
for the interoperability of managed hardware components from various vendors with the
overall platform management infrastructure.
An industry recognized standard for implementing and controlling hot-swap
An industry recognized hardware interface for key platform components such as power
supplies, cooling units, system boards, and peripheral boards
An industry recognized interface for platform alarming
An industry recognized standard for redundant intra-system interconnects
8.5 Standards
8.5.1 IPMI
An example of an open standard, which does a good job of meeting the above hardware capability
requirements is the Intelligent Platform Management Interface (IPMI) specification. This
specification is available at http://developer.intel.com/design/servers/ipmi.
IPMI defines an open-standard abstraction interface and protocol targeted at component level
platform management, which supports the sorts of capabilities described in the preceding
paragraphs.
IPMI also defines a standard interconnect to support communication between platform components
within the hardware layer and between the hardware and operating system layers. Communication
between components is provided by the Intelligent Platform Management Bus (IPMB), which is
typically bused throughout the chassis to connect all platform components including cooling units
and power supplies. The IPMI specification also defines standard communication channels
between the hardware platform and the operating system. These communication channels allow the
operating system, management middleware, and applications to access platform management
capabilities of the hardware. Other key capabilities defined by the IPMI specification include: