Installation guide
Managing MTBF Statistics
HP-UX version 11.00.03 Administering Fault Tolerant Hardware 5-29
hw_path is the hardware path of the device.
If the device does not change to CLAIMED, call the CAC for further assistance. For
more information about contacting the CAC, see the Preface of this manual.
Managing MTBF Statistics
The system maintains statistics on the mean time between failures (MTBF) for each
hardware device in the system. The following sections describe how the MTBF is
calculated; how to display, clear, and set the MTBF threshold; and how to
configure the minimum number of samples, as well as two other important
variables, numsamp, and the soft error weightage, soft_wt.
For more information about the hard and soft errors that trigger the system to
evaluate the MTBF, see “Error Detection and Handling.”
MTBF Calculation and Affects
For each error that occurs, the system performs certain calculations.
If the error is a hard error, the system records the time of the error and increments
the total error count. Then the system takes the device out of service and places it
in the ERROR state. Finally, the system calculates the MTBF
1
and compares it with
the threshold. One of the following occurs:
■ If the MTBF is less than the threshold, the system leaves the device in the
ERROR state.
■ If the MTBF is greater than the threshold, the system attempts to enable the
device and return it to the CLAIMED state.
If the error is a soft error, the system increments the soft error count and compares
the soft error count to the soft_wt variable. One of the following occurs:
■ If the soft error count is less than the soft_wt variable, the system takes no
further action and continues to monitor the device for errors.
■ If the soft error count equals the soft_wt variable, the system records the
time of the error, increments the total error count, and clears the soft error
count. Then the system calculates the MTBF and compares it with the
threshold. One of the following occurs:
1
The system does not calculate MTBF until the total error count equals the numsamp variable, and
then it uses the recorded times of the last numsamp errors to calculate MTBF. If MTBF has not
yet been calculated, the system considers the MTBF value unreliable and acts as if MTBF is
greater than the threshold.