ccNUMA Overview
Introduction
ccNUMA (cache coherent Non-Uniform Memory Architecture) systems offer programmers and users the simplicity
and flexibility of symmetric multiprocessing (SMP) with the memory scalability of clusters. In a ccNUMA system,
processors, memory, and I/O are grouped together into cells. The latency and bandwidth characteristics of
communication within a cell are “fast,” while going outside a cell is “slow.” Since the memory in ccNUMA
systems is physically distributed but logically shared, these systems offer better performance to applications that
are optimized to use their features. For non-optimized applications, they still offer better performance since the
default behavior is designed to be benign—if not beneficial—and they still have access to much larger shared
resources of memory, CPUs, and disk space.
HP-UX has made use of ccNUMA technology in its newest, most scalable machines. The HP Integrity systems will
have ccNUMA support in HP-UX 11i v2, while the HP 9000 systems will have ccNUMA support in HP-UX 11i v3.
Executive summary
The newest, largest HP systems make use of a design called ccNUMA. In this design, processors, memory, and
I/O are grouped together into cells. The latency and bandwidth characteristics of memory within a cell are “fast,”
while accesses outside a cell are “slow.” HP-UX makes use of this design by attempting to keep the majority of
accesses local to the same cell. This is achieved through good default behavior and by providing interfaces for
ccNUMA-aware applications to make the best use of the architecture.
Problem statement
SMP systems cannot offer sufficient memory bandwidth for large numbers of processors without incurring
excessive penalties due to memory latency. ccNUMA architectures offer scalable memory bandwidth while
keeping memory latencies reasonable—often delivering latencies in the same class as much smaller systems.
It is intended that many applications can run unmodified and perform well, but in order to reach maximum
performance, important applications may need to be modified to exploit the new features of HP-UX 11i v2.
Introduction to ccNUMA
Physical memory characteristics
Physical memory refers to the actual physical arrangement and connection of the memory to the rest of the
computer system. Since applications running on HP-UX deal with the system in terms of virtual memory, knowing
about and optimizing applications with respect to physical memory is not a common concept in application
design.
In cell-based systems like high-end HP servers, memory is arranged in cells, often symmetrically throughout the
system. But it is not required that memory be symmetric, and due to budget constraints and failures, memory may
not be evenly distributed over all the cells. These systems can support any amount of memory up to 32 GB on a
cell.
Capacity is the quantity of memory that is available to an application.
Latency is the time it takes for a memory reference from a processor to be satisfied by the memory system. Often
measured in nanoseconds, this is typically the “load to use” latency. This quantity includes the time the processor
takes to put the request on the memory bus, and the time it takes to transfer the request to the memory controller
and deal with all the cache hierarchy, satisfy the request, and return the data to the register on the processor
requesting the data.
Bandwidth is the rate at which data is transferred to the processor or to the I/O subsystem.
Occupancy is the amount of time a memory image exists in memory. This is an issue because a lot of interesting
applications have very high degrees of occupancy, meaning they exist for long periods of time. This gives the
system opportunities to modify and optimize the performance of an application.
3