ccNUMA Overview
Memory allocation
The overall strategy of the memory system is to keep private data local and shared data global. Therefore, the
default behavior will be adequate for most applications. Application developers who understand the data
structures of the application and how the data is accessed can use the tools presented here to optimize
performance.
malloc()
Malloc and the new operator (C++) are probably the most common memory allocators, but there is no special
method to control what memory locality they get their memory from. Malloc and the new operator will attempt to
satisfy their memory allocations from the local memory of the LDOM they are called in.
mmap() and shmget()
Mmap is the primary tool for allocating and managing ccNUMA memory.
Table 6. Memory allocation flags
MMAP FLAG SHMGET FLAG Physical memory will come from …
MAP_MEM_LOCAL IPC_MEM_LOCAL The locality of the CPU making the system call
MAP_MEM_INTERLEAVED IPC_MEM_INTERLEAVED
Interleaved memory (round robin or closest first if no interleaved
memory available)
MAP_MEM_FIRST_TOUCH IPC_MEM_FIRST_TOUCH The locality of the first CPU to touch the page
If MAP_SHARED is specified, then MAP_MEM_INTERLEAVED is the default. For MAP_PRIVATE,
MAP_MEM_FIRST_TOUCH is the default. For MAP_MEM_LOCAL and MAP_MEM_FIRST_TOUCH, the allocation
occurs in the LDOM of the calling thread or process. Use mpctl() (described in “Utilizing the execution domain”) to
ascertain and control the execution domain.
Fortran/common/
Fortran common and static data (such as local variables compiled with +save) is mapped to the processes
data/bss section. The system command, chatr, has been extended to select where to map this memory. The
default is to map this data in local memory. Some programs will want to map common to interleave memory,
especially if all the threads cannot execute within one LDOM. Depending on the application, reference patterns to
data may also lead to choosing interleave memory as the appropriate place to map data in /common/.
Performance-optimized page sizing (POPS)
HP-UX 11i v2 implements performance-optimized page sizing (POPS). Traditionally, UNIX implements a 4 KB
page, but for modern applications this is much too small to efficiently map large amounts of memory. Using 4 KB
pages, the processor can only map small regions of an application’s memory, and it incurs a very large penalty
doing TLB translations that degrade performance. POPS transparently increases the page size for applications up
to the vps_ceiling kernel tunable. Applications are not aware of the page size. Developers and users can set page
size hints through link-time flags and using the chatr tool.
The largest page size supported by HP-UX 11i v2 is 4 GB. Valid values are 4 KB, 16 KB, 64 KB, … ,
1 GB, 4 GB. That is, 4K
n
, where n varies from 1 to 11.
When a process allocates memory in the interleaved region, it is important to match the requested amount of
memory to the page sizes available for the process. For example, ensure that the amount of memory requested is
a simple multiple of a page size that is available on the system.
15