Installation guide
Fault Tolerant Design
1-6 Fault Tolerant System Administration (R1004H) HP-UX version 11.00.03
Fault Tolerant Design
Continuum systems are fault tolerant; that is, they continue operating even if
major components fail. Continuum systems provide both hardware and software
features that maximize system availability.
Fault Tolerant Hardware
The fault tolerant hardware features include the following:
■ Continuum systems employ a parallel pair and spare architecture for most
hardware components that lets two physical components operate either as a
true lock-step pair (identical and precisely parallel simultaneous actions) or as
an online/standby pair. In either case, the pair operates as a single unit, which
provides fault tolerance if one of the components should fail.
■ Continuum systems consist of modularized hardware components designed
for easy servicing and replacing. Many hardware components (such as
suitcases or CPU/memory boards, I/O controller cards, disk and tape devices,
and power supplies) are CRUs and can be replaced on site by system
administrators with minimal training or tools. Most other hardware are
field-replaceable units (FRUs) and can be replaced on site by trained Stratus
personnel.
■ Some components are hot pluggable; that is, the system administrator can
replace them without interrupting system services. You can dynamically
upgrade some components.
■ Most components have self-checking diagnostics that identify and alert the
system to any problems. When a diagnostic program detects a fault, it sends a
message to the fault tolerant services (FTS) software subsystem. The FTS
constantly monitors and evaluates hardware and software problems and
initiates corrective actions.
■ Most components include a set of status lights that immediately alerts an
administrator about the status of the component.
■ Continuum Series 400/400-CO systems boot from a 20-MB PCMCIA flash
card.
■ All Continuum systems include a port that you can configure and connect to a
UPS. All Continuum systems provide logic for “ride-through” power failure
protection, in which batteries power the system without interruption during
short outages, and full shutdown power failure protection and recovery when
longer outages require a machine shutdown.