Specifications

April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide
13
Add: 16798.1705 0.1253 0.1248 0.1267
Triad: 16724.2629 0.1259 0.1254 0.1279
-------------------------------------------------------------
Repeat on each NUMA node. If there is significant performance difference between the bandwidth
achieved on each NUMA node, check the memory configuration that corresponds to that NUMA node.
- Peak memory bandwidth is achieved when STREAM is run on three cores of each NUMA node. For
example, the following run shows that the same system is capable of achieving STREAM 5% better
than when using all cores.
> export O64_OMP_AFFINITY=”TRUE”
> export O64_OMP_AFFINITY_MAP=”2,4,6,10,12,14,18,20,22,26,28,30”
> export OMP_NUM_THREADS=12
> ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 87380000, Offset = 1840
Total memory required = 2000.0 MB.
Each test is run 30 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 12
-------------------------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 22461 microseconds.
(= 22461 clock ticks)