6.7

Table Of Contents
Challenges for Operating Systems
Because a NUMA architecture provides a single system image, it can often run an operating system with
no special optimizations.
The high latency of remote memory accesses can leave the processors under-utilized, constantly waiting
for data to be transferred to the local node, and the NUMA connection can become a bottleneck for
applications with high-memory bandwidth demands.
Furthermore, performance on such a system can be highly variable. It varies, for example, if an
application has memory located locally on one benchmarking run, but a subsequent run happens to place
all of that memory on a remote node. This phenomenon can make capacity planning difficult.
Some high-end UNIX systems provide support for NUMA optimizations in their compilers and
programming libraries. This support requires software developers to tune and recompile their programs
for optimal performance. Optimizations for one system are not guaranteed to work well on the next
generation of the same system. Other systems have allowed an administrator to explicitly decide on the
node on which an application should run. While this might be acceptable for certain applications that
demand 100 percent of their memory to be local, it creates an administrative burden and can lead to
imbalance between nodes when workloads change.
Ideally, the system software provides transparent NUMA support, so that applications can benefit
immediately without modifications. The system should maximize the use of local memory and schedule
programs intelligently without requiring constant administrator intervention. Finally, it must respond well to
changing conditions without compromising fairness or performance.
How ESXi NUMA Scheduling Works
ESXi uses a sophisticated NUMA scheduler to dynamically balance processor load and memory locality
or processor load balance.
1 Each virtual machine managed by the NUMA scheduler is assigned a home node. A home node is
one of the system’s NUMA nodes containing processors and local memory, as indicated by the
System Resource Allocation Table (SRAT).
2 When memory is allocated to a virtual machine, the ESXi host preferentially allocates it from the
home node. The virtual CPUs of the virtual machine are constrained to run on the home node to
maximize memory locality.
3 The NUMA scheduler can dynamically change a virtual machine's home node to respond to changes
in system load. The scheduler might migrate a virtual machine to a new home node to reduce
processor load imbalance. Because this might cause more of its memory to be remote, the scheduler
might migrate the virtual machine’s memory dynamically to its new home node to improve memory
locality. The NUMA scheduler might also swap virtual machines between nodes when this improves
overall memory locality.
vSphere Resource Management
VMware, Inc. 120