Locality-Optimized Resource Alignment for Superdome 2

16
partitions of equal size. It would be less ideal to have 8 virtual partitions, because you would be
forced to split some of them across two different sockets.
A minor optimization, and one which also applies for 100% interleaved nPartitions, is to keep the
cores on a socket in the same virtual partition, because they share a common cache. Keeping such
cores working together on the same application can give a small performance benefit, which is worth
realizing if the choice of cores is otherwise arbitrary. Use mpsched -K to identify the cores that
share a common socket.
Factory preconfiguration
HP offers the option of delivering Superdome 2 systems with virtual partitions preconfigured at the
factory. In that case, the virtual partitions are configured according to the LORA guidelines. As can
be seen from the examples shown above, it is sometimes not possible to achieve perfect alignment for
each virtual partition. The factory configuration tool does the best that it can, and it always
guarantees that the largest virtual partitions have the best possible configuration. If splintering is
unavoidable, the smaller virtual partitions are chosen for the less than perfect alignment.
Advanced tuning
An important part of the LORA value proposition is to deliver ease-of-use along with performance.
HP's goal is that LORA should work out-of-the-box, without the need for system administrators to
perform explicit tuning. Several factors make the goal impossible to reach in every single case. The
range of applications deployed across the HP-UX customer base is extremely diverse. So is the
capacity of the servers: the applications could be deployed in a virtual partition with two processor
cores and 3 GB of memory, or in a hard partition with 64 cores and 2 TB of memory. In addition,
workloads can exhibit transient spikes in demand many times greater than the steady-state average.
Here is the LORA philosophy for coping with this dilemma: provide out-of-the-box behavior that is
solid in most circumstances, but implement mechanisms to allow system administrators to adjust the
behavior to suit the idiosyncrasies of their particular workload if they need to. This section discusses
some possibilities for explicit tuning to override the automatic LORA heuristics.
numa_mode
kernel tunable parameter
The numa_mode kernel tunable parameter controls the mode of the kernel with respect to NUMA
platform characteristics. Because of the close coupling between memory configuration and kernel
mode, it is recommended to accept the default value of numa_mode, which is 0, meaning to auto
sense the mode at boot time. Systems configured in accordance with the LORA guidelines will be
auto sensed into LORA mode; otherwise they will operate in SMP mode. As described in the
numa_mode man page, the tunable can be adjusted to override the autosensing logic.
In LORA mode, HP-UX implements a number of heuristics for automatic workload placement to
establish good alignment between the processes executing an application in the memory that they
reference. Every process and every thread is assigned a home locality. Processes and threads may
temporarily be moved away from their home localities to balance the system load, but they are
returned back home as soon as is practical. In general, the memory objects created by a process are
placed in memory in its home locality. Shared memory objects too large to fit within a single locality,
are distributed evenly across all of the localities in the processor set containing the processor from
which the memory allocation was made.