Specifications
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide
10
node distances:
node 0 1 2 3
0: 10 16 16 16
1: 16 10 16 16
2: 16 16 10 16
3: 16 16 16 10
If the size: is different on some nodes, then DIMMs are either not identical or not plugged into the right
sockets.
If
numactl --hardware only shows one node, then ACPI is not operating properly in the kernel. This could
be the result of ACPI not being enabled in BIOS or a kernel being too old to recognize the NUMA configuration
BIOS provides to the kernel during boot.
2.4 STREAM to Verify Configuration
Many performance issues are normally caused by poorly configured memory. Since HPC performance is
strongly dependent on memory performance, we next describe how to verify that the machine is configured to
achieve the maximum memory performance possible with the AMD Opteron™ 4200/6200 Series processors.
We will use the STREAM benchmark to verify memory bandwidth. We will show how to build STREAM with GCC
and with the AMD Open64 Compiler Suite for a quick but low-performance test that can reach the potential
memory bandwidth performance expected with AMD Opteron™ 4200/6200 Series processors.
• GET AND BUILD STREAM
1. Download STREAM from the University of Virginia at: http://www.cs.virginia.edu/stream/.
2. Download the C source code from: http://www.cs.virginia.edu/stream/FTP/Code/stream.c.
3. Edit stream.c and change the definitions of N, NTIMES, and OFFSET to the following:
#define N 873800
#define NTIMES 30
#define OFFSET 1840
Non-uniform memory access (NUMA) is a system design that became popular on UNIX servers in the mid-
1990s. It is based on the concept that access to all bytes in memory need not occur at uniform access rates.
Today’s non-uniform memory access is a performance-enhancing technique that leverages the extraordinary
amount of memory now available on systems. For an additional discussion of NUMA, see http://developer.amd.
com/assets/LibNUMA-WP-fv1.pdf.
2.5 Easy STREAM Using GCC Compiler
GCC is included with any Linux distribution. Unfortunately, GCC does not generate efficient code for STREAM.
Nonetheless, using STREAM built with GCC can uncover memory performance issues.
• BUILD STREAM WITH GCC VERSION 4.3.4, WHICH IS NATIVE TO SLES 11 SP1, USING THE FOLLOWING
FLAGS:
-O2 -msse -msse2 -msse3 -o stream stream.c -static -fopenmp