Locality-Optimized Resource Alignment for Superdome 2

Table Of Contents

partitions of equal size. It would be less ideal to have 8 virtual partitions, because you would be

forced to split some of them across two different sockets.

A minor optimization, and one which also applies for 100% interleaved nPartitions, is to keep the

cores on a socket in the same virtual partition, because they share a common cache. Keeping such

cores working together on the same application can give a small performance benefit, which is worth

realizing if the choice of cores is otherwise arbitrary. Use mpsched -K to identify the cores that

share a common socket.

Factory preconfiguration

HP offers the option of delivering Superdome 2 systems with virtual partitions preconfigured at the

factory. In that case, the virtual partitions are configured according to the LORA guidelines. As can

be seen from the examples shown above, it is sometimes not possible to achieve perfect alignment for

each virtual partition. The factory configuration tool does the best that it can, and it always

guarantees that the largest virtual partitions have the best possible configuration. If splintering is

unavoidable, the smaller virtual partitions are chosen for the less than perfect alignment.

Advanced tuning

An important part of the LORA value proposition is to deliver ease-of-use along with performance.

HP's goal is that LORA should work out-of-the-box, without the need for system administrators to

perform explicit tuning. Several factors make the goal impossible to reach in every single case. The

range of applications deployed across the HP-UX customer base is extremely diverse. So is the

capacity of the servers: the applications could be deployed in a virtual partition with two processor

cores and 3 GB of memory, or in a hard partition with 64 cores and 2 TB of memory. In addition,

workloads can exhibit transient spikes in demand many times greater than the steady-state average.

Here is the LORA philosophy for coping with this dilemma: provide out-of-the-box behavior that is

solid in most circumstances, but implement mechanisms to allow system administrators to adjust the

behavior to suit the idiosyncrasies of their particular workload if they need to. This section discusses

some possibilities for explicit tuning to override the automatic LORA heuristics.

numa_mode

kernel tunable parameter

The numa_mode kernel tunable parameter controls the mode of the kernel with respect to NUMA

platform characteristics. Because of the close coupling between memory configuration and kernel

mode, it is recommended to accept the default value of numa_mode, which is 0, meaning to auto

sense the mode at boot time. Systems configured in accordance with the LORA guidelines will be

auto sensed into LORA mode; otherwise they will operate in SMP mode. As described in the

numa_mode man page, the tunable can be adjusted to override the autosensing logic.

In LORA mode, HP-UX implements a number of heuristics for automatic workload placement to

establish good alignment between the processes executing an application in the memory that they

reference. Every process and every thread is assigned a home locality. Processes and threads may

temporarily be moved away from their home localities to balance the system load, but they are

returned back home as soon as is practical. In general, the memory objects created by a process are

placed in memory in its home locality. Shared memory objects too large to fit within a single locality,

are distributed evenly across all of the localities in the processor set containing the processor from

which the memory allocation was made.