Specifications

April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide

Add: 16798.1705 0.1253 0.1248 0.1267

Triad: 16724.2629 0.1259 0.1254 0.1279

-------------------------------------------------------------

Repeat on each NUMA node. If there is signiﬁcant performance difference between the bandwidth

achieved on each NUMA node, check the memory conﬁguration that corresponds to that NUMA node.

- Peak memory bandwidth is achieved when STREAM is run on three cores of each NUMA node. For

example, the following run shows that the same system is capable of achieving STREAM 5% better

than when using all cores.

> export O64_OMP_AFFINITY=”TRUE”

> export O64_OMP_AFFINITY_MAP=”2,4,6,10,12,14,18,20,22,26,28,30”

> export OMP_NUM_THREADS=12

> ./stream

-------------------------------------------------------------

STREAM version $Revision: 5.9 $

-------------------------------------------------------------

This system uses 8 bytes per DOUBLE PRECISION word.

-------------------------------------------------------------

Array size = 87380000, Offset = 1840

Total memory required = 2000.0 MB.

Each test is run 30 times, but only

the *best* time for each is used.

-------------------------------------------------------------

Number of Threads requested = 12

-------------------------------------------------------------

Printing one line per active thread....

-------------------------------------------------------------

Your clock granularity/precision appears to be 1 microseconds.

Each test below will take on the order of 22461 microseconds.

(= 22461 clock ticks)