White Papers
9
Dell EMC Fault Resilient Memory
Introduction to Dell Fault Resilient Memory (FRM)
1.8
Configure Reliable memory for Virtual Machines
VMware define a priority mechanism for various processes running on ESXi to ensure that the reliable
memory region is mapped to processes based on their priority. VMKernel and the VMM gets the highest
priority (Priority 0) and they make use of the reliable memory region when it is enabled from the platform.
Some of the critical userworld processes such as hostd, vpxa services running on ESXi also make use of
reliable memory region using the tag ‘memreliable’. These userworld critical processes are marked as Priority
1. Similarly, an administrator can tag a virtual machine memory address ranges to the protected region by
explicitly adding a parameter to the virtual machine’s .vmx file. For more information, see the VMware KB
2146595.
1.9
Monitoring FRM redundancy failure
This section describes about monitoring the memory redundancy related events from iDRAC System Event
Log (SEL). When an uncorrectable error (UCE) occurs on a reliable region for the first time, the SEL logs an
entry, but the hypervisor does not crash. This clearly shows that the FRM provided memory redundancy is
lost and occurrence of one more uncorrectable error to any of the memory addresses’ in Socket 0 will result
in a Purple Screen Of Death (PSOD).
Note that the system continues working with more than one UCE, as long as the error is not persistent. The
system always write-back data when UCE is detected. If the UCE is still persistent after the write-back, then
the memory redundancy is lost.
Figure 6 : FRM Memory redundancy lost when uce occurs in the memory channels in Socket 0