6.7

ManualsBrandsVMware ManualsApplicationsvSphere

111

112

113

114

115

116

117

118

119

120

Table Of Contents

vSphere Resource Management

Challenges for Operating Systems

Because a NUMA architecture provides a single system image, it can often run an operating system with

no special optimizations.

The high latency of remote memory accesses can leave the processors under-utilized, constantly waiting

for data to be transferred to the local node, and the NUMA connection can become a bottleneck for

applications with high-memory bandwidth demands.

Furthermore, performance on such a system can be highly variable. It varies, for example, if an

application has memory located locally on one benchmarking run, but a subsequent run happens to place

all of that memory on a remote node. This phenomenon can make capacity planning difficult.

Some high-end UNIX systems provide support for NUMA optimizations in their compilers and

programming libraries. This support requires software developers to tune and recompile their programs

for optimal performance. Optimizations for one system are not guaranteed to work well on the next

generation of the same system. Other systems have allowed an administrator to explicitly decide on the

node on which an application should run. While this might be acceptable for certain applications that

demand 100 percent of their memory to be local, it creates an administrative burden and can lead to

imbalance between nodes when workloads change.

Ideally, the system software provides transparent NUMA support, so that applications can benefit

immediately without modifications. The system should maximize the use of local memory and schedule

programs intelligently without requiring constant administrator intervention. Finally, it must respond well to

changing conditions without compromising fairness or performance.

How ESXi NUMA Scheduling Works

ESXi uses a sophisticated NUMA scheduler to dynamically balance processor load and memory locality

or processor load balance.

1 Each virtual machine managed by the NUMA scheduler is assigned a home node. A home node is

one of the system’s NUMA nodes containing processors and local memory, as indicated by the

System Resource Allocation Table (SRAT).

2 When memory is allocated to a virtual machine, the ESXi host preferentially allocates it from the

home node. The virtual CPUs of the virtual machine are constrained to run on the home node to

maximize memory locality.

3 The NUMA scheduler can dynamically change a virtual machine's home node to respond to changes

in system load. The scheduler might migrate a virtual machine to a new home node to reduce

processor load imbalance. Because this might cause more of its memory to be remote, the scheduler

might migrate the virtual machine’s memory dynamically to its new home node to improve memory

locality. The NUMA scheduler might also swap virtual machines between nodes when this improves

overall memory locality.

vSphere Resource Management

VMware, Inc. 120