Specifications
162 IBM Power 770 and 780 Technical Overview and Introduction
Finally, if an uncorrectable error in memory is discovered, the logical memory block
associated with the address with the uncorrectable error is marked for deallocation by the
POWER Hypervisor. This deallocation takes effect on a partition reboot if the logical memory
block is assigned to an active partition at the time of the fault.
In addition, the system deallocates the entire memory group that is associated with the error
on all subsequent system reboots until the memory is repaired. This way is intended to guard
against future uncorrectable errors while waiting for parts replacement.
Memory persistent deallocation
Defective memory that is discovered at boot time is automatically switched off. If the service
processor detects a memory fault at boot time, it marks the affected memory as bad so that it
is not to be used on subsequent reboots.
If the service processor identifies faulty memory in a server that includes CoD memory, the
POWER Hypervisor attempts to replace the faulty memory with available CoD memory.
Faulty resources are marked as deallocated, and working resources are included in the active
memory space. Because these activities reduce the amount of CoD memory available for
future use, schedule repair of the faulty memory as soon as convenient.
Upon reboot, if not enough memory is available to meet minimum partition requirements, the
POWER Hypervisor reduces the capacity of one or more partitions.
Depending on the configuration of the system, the HMC Service Focal Point™, the OS
Service Focal Point, or the service processor receives a notification of the failed component
and triggers a service call.
4.2.4 Active Memory Mirroring for Hypervisor
Active Memory Mirroring (AMM) for Hypervisor is a hardware and firmware function of
Power 770 and Power 780 systems that provides the ability of the POWER7 chip to create
two copies of data in memory. Having two copies eliminates a system-wide outage due to an
uncorrectable failure of a single DIMM in the main memory used by the hypervisor (also
called System firmware). This capability is standard and enabled by default on the Power 780
server. On the Power 770 it is an optional chargeable feature.
What memory is mirrored
These are the areas of memory that are mirrored:
Hypervisor data that is mirrored
– Hardware Page Tables (HPTs) that are managed by the hypervisor on behalf of
partitions to track the state of the memory pages assigned to the partition
– Translation control entries (TCEs) that are managed by the hypervisor on behalf of
partitions to communicate with partition I/O buffers for I/O devices
– Hypervisor code (instructions that make up the hypervisor kernel)
– Memory used by hypervisor to maintain partition configuration, I/O states, Virtual I/O
information, partition state, and so on
Note: Memory page deallocation handles single cell failures, but because of the sheer size
of data in a data bit line, it might be inadequate for dealing with more catastrophic failures.