Specifications
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide
22
• If performance is paramount, enable C6, APM, HPC P-state Mode, and put Linux governor in performance
mode. This will run at frequencies between software P0 and Pb0 but will never run at the lower
frequencies associated with software P1, P2, etc. This will allow the processor to run at the fastest
possible clock speeds but will increase the amount of power drawn by the system.
• If stable performance is paramount (i.e., minimize jitter in the frequencies at which all of the cores are
running), then disable APM and put Linux governor in performance mode. This will run at software P0
frequency only. This will enable more consistent results for benchmarks and comparison purposes.
When power management is enabled, it can be difficult to determine if performance differences are due
to clock speed changes or other factors.
• If power performance is paramount, enable C6 and APM, disable HPC P-state mode, and put Linux
governor in on-demand mode. This will run extreme HPC workloads (e.g., HPL and even parts of SPEC
CFP2006rate) at slower P-state frequencies than software P0, spending significant time in P1 and maybe
even P2. Overall performance may be lower, but this mode should provide enough power consumption
benefits, improving the ratio of performance to power consumption of the system.
4.4 Thread to Core Assignment Considerations
For a 1-socket IL-16 part with 16 cores running just two one-threaded jobs, the best allocation would be bind
to core 0 and core 8 to allocate one job per die. You could order allocation to maximize the shared resources
(typical round-robin is beneficial going first over die [eight cores per die-sharing memory and L3] then over core
pairs [2 cores per pair sharing L2, FPU, and fetch decode]).
Note that for some applications you may want to minimize power use rather than maximize performance, so
filling the jobs from core 0 in a linear fashion could be the best strategy. This will allow the unused core pairs to
enter the C6 sleep mode saving considerable power.
The term job means a single-threaded unit of code such as an MPI rank or a single thread of an OpenMP
application.
The interesting cases are:
• No cores active.
• 1 core active.
• All cores active.
• Some number of cores active between 1 and all.
Next, decide what you want to achieve in terms of power usage and resource allocation. Your choices are:
• Allocate jobs to minimize the total power used and allow most resources to be turned off.
- Allocate from core 0 in a linear fashion until all cores are active.
• Allocate jobs to maximize performance by enabling the maximum resources to be turned on.
• Allocate round-robin over each of the following in turn until that resource is filled, then go round-robin to
the next level.
- Over sockets, skip by 16.
- Over Die / NUMA memory domains, skip by 8.
- Over Core pairs, skip by 2.
For a 2-socket node with 16-core parts, the highest performance order of core allocation is:
• Two jobs - Use both sockets and use both L3 caches: 0, 16.
• Four jobs - Use all 4 die and all memory controllers: 0, 8, 16, 24.