Advanced memory protection for HP ProLiant 300 series G4 servers

As mentioned previously, the ProLiant 300-series G4 servers periodically verify the online spare bank
during normal operation. If a potential uncorrectable error in the spare bank is detected, support for
switching over to the spare bank will be disabled. The health driver will log a message to the console
and to the IML. In addition, the internal health LED will indicate a degraded state, the Online Spare
Status LED will illuminate amber, and the DIMM LEDs for the spare DIMMs will illuminate amber. If a
potential uncorrectable error is detected in the online spare bank prior to an online spare switchover,
the system will continue to operate normally, just without the protection of Online Spare mode. The
system will not crash or NMI.
A system in Online Spare mode does not have full protection from uncorrectable errors since Online
Spare does not provide this level of protection. An uncorrectable memory error will result in a system
crash and NMI. However, a system in Online Spare mode has a reduction in the probability of
receiving an uncorrectable error because DIMMs that are exceeding the correctable error threshold,
and thus at higher risk of receiving an uncorrectable error, are deactivated. Once the switchover has
occurred to the spare bank or once support for switching over to the spare bank has been disabled in
the case that the system detects a potential uncorrectable error in the non-active online spare bank,
the system no longer has a spare bank to switch over to in the event that the correctable error
threshold is exceeded on another bank. These events would then be treated as if the system were in
Advanced ECC mode. The health driver would report the event and the failed DIMM’s LED will
illuminate, but no switchover will occur and the failing memory will remain active. The system can be
powered off at the user’s convenience to replace the failed DIMMs. It is important to note that with
each reboot, the system will continue to attempt to boot off the original memory. It is expected in this
case that the memory will fail again at some point and the spare bank will again become active.
9