AMD Opteron™ 6200 Series Processors Linux Tuning Guide
© 2012 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide Contents 1.0 Introduction……………………………………………………………………………………………………………………………………………………………………………4 1.1 Intended Audience… …………………………………………………………………………………………………………………………………………………4 1.2 AMD’s New Core Architecture Overview…………………………………………………………………………………………………………4 1.3 Shared Resources………………………………………………………………………………………………………………………………………………………6 1.4 Dedicated Resources… ……………………………………………………………………………………………………………………………………………6 1.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 1.0 Introduction This guide provides configuration, optimization, and tuning information and recommendations for AMD Opteron™ 6200 Series processors (formerly code-named “Interlagos” and built on AMD’s new core architecture) running in a Linux environment. This guide is designed to help users through initial bios and system setup to ensure that a base level of performance is achieved.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide The AMD Opteron™ 6200 Series processors can boost core frequencies by allocating more power to individual cores, subject to a total power limit and other restrictions. Cores that are idle can “go to sleep” and turn off their power draw, allowing more power to be dedicated to active cores. As a result, a single core job could see a processor running at 3.2Ghz, which all cores active see cores running at 2.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 1.3 Shared Resources AMD Opteron™ 6200 Series processors shared resources boost single core performance. The processor has numerous features designed to boost the performance of single-core jobs and a job load that does not use all the cores. This is a good thing but, unless understood, could be viewed as poor scaling. The goal is to run single-core jobs faster and, where possible, reallocate resources and power to boost performance.
April 2012 v1 1.5 AMD Opteron™ 6200 Linux Tuning Guide Floating Point Capabilities Today’s server workloads require a broad mix of processor capabilities, from those using mostly integer operations to those where floating point performance is paramount. The challenge for a general purpose processor is to be fast and power efficient at both of these extremes.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 2.0 Getting Started Start by ensuring your system is configured properly. Since memory configuration is critical to performance and vulnerable to misconfiguration, you will verify proper memory configuration by inspection, observing Linux’s view of the memory configuration, and finally by verifying the memory performance using the STREAM benchmark. In this section you will: •• Check physical memory configuration. •• Select specific BIOS options.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 9. Virtualization, AMD V = Disabled. 10. OS Power Management = Disabled. Note: Always refer to your motherboard or system’s owner’s manual for further BIOS setting information. For production HPC system BIOS recommendations, see section 4.0 Configure a Performant Production System below. See also the BIOS and Kernel Developer’s Guide (BKDG) at: http://support.amd.com/us/ Processor_TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide node distances: node 0 1 2 3 0: 10 16 16 16 2: 16 16 10 16 1: 3: 16 16 10 16 16 16 16 10 If the size: is different on some nodes, then DIMMs are either not identical or not plugged into the right sockets. If numactl --hardware only shows one node, then ACPI is not operating properly in the kernel.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide •• Run STREAM Run STREAM on all 32 cores of a 2P system with AMD Opteron™ 6276 Series processors and 64GB (8x8GB) of 1600Mhz memory as follows: > export OMP_NUM_THREADS=32 > ./stream ------------------------------------------------------------STREAM version $Revision: 5.9 $ ------------------------------------------------------------This system uses 8 bytes per DOUBLE PRECISION word.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide At least 20% better STREAM performance can be achieved using GCC 4.6.0 or later, but these later versions are unlikely to be included in Linux distributions today. However, 70% better STREAM can be achieved by using the AMD Open64 compiler. 2.6 High Performance STREAM Using AMD Open64 Compiler Build STREAM using the Open64 compiler when attempting to measure the best achievable memory performance. •• Download and install the AMD Open64 compiler.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide Add: Triad: 16798.1705 16724.2629 0.1253 0.1259 0.1248 0.1254 0.1267 0.1279 ------------------------------------------------------------- Repeat on each NUMA node. If there is significant performance difference between the bandwidth achieved on each NUMA node, check the memory configuration that corresponds to that NUMA node. - Peak memory bandwidth is achieved when STREAM is run on three cores of each NUMA node.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------Function Rate (MB/s) Copy: 73113.2180 Add: 68265.6350 Scale: Triad: 74003.6161 67651.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 3.0 Operating System and Software Choices This section helps you understand the various operating systems, compilers, and libraries you should consider when attempting any optimization or tuning processes. In this section, you will: •• See the HPC Sample Operating System (OS) & Compiler Configurations. •• See the Commercial Options for Linux. •• Understand Compiler Choices. •• Find Linux Kernel Versions and Distributions.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 3.2 Linux Kernel Versions and Distributions For best possible performance, you should use an enabled Linux kernel version or distribution that fully supports and enables the new features of the AMD Opteron™ 4200/6200 Series processors.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 3.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 3.5 Compiling for AMD’s New Core Architecture Instructions The shared floating point unit for the AMD Opteron™ 6200 Series processors features new FMA4 and XOP instructions that can improve floating point throughput for workloads. For more details on the new instructions see the AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions at http://support.amd.com/us/Embedded_TechDocs/43479.pdf.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 3.6 Libraries Special purpose high-performance libraries are another important contributor to application performance. A classic example of this is when a program makes extensive use of BLAS (Basic Linear Algebra Subroutines) and LAPACK (Linear Algebra PACKage) subroutines. In these cases, it is imperative to use an FMA4-enabled floating point library.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 4.0 Configure a Performant Production System This section provides tasks to help you check existing performance and improve future performance on production systems. 4.1 BIOS Configuration Options The following are the recommended BIOS configurations for HPC. •• APM enabled to enable core frequency boost.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide P3 := Freq: 2300 MHz P4 := Freq: 1400 MHz In this example, P0 is the highest non-boosted P state in normal operation. Pb1 is the first boosted frequency. Having HPC P-state mode enabled reconfigures P1 through P3 to be set to the same frequency as P0. This prevents slower performance as long as the system TDP design allows continuous operation at these frequencies. 4.3.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide •• If performance is paramount, enable C6, APM, HPC P-state Mode, and put Linux governor in performance mode. This will run at frequencies between software P0 and Pb0 but will never run at the lower frequencies associated with software P1, P2, etc. This will allow the processor to run at the fastest possible clock speeds but will increase the amount of power drawn by the system. •• If stable performance is paramount (i.e.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide •• 16 jobs - Use all L2 caches and shared floating point units: 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30. •• 32 jobs - Use all L2 caches and shared floating point units: 0,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 5.0 Known Issues 5.1 Address Space Layout Randomization (ASLR) The AMD Opteron™ 4200/6200 Series processors have a shared Level-1 instruction-cache that, under specific circumstances, leads to different performance characteristics to previous processor generations.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 6.0 Useful Tools This section lists some tools you may find helpful when analyzing performance and measuring power states. •• CodeAnalyst - Use CodeAnalyst for application profiling. For details, see http://developer.amd.com/tools/CodeAnalyst/ codeanalystlinux. •• PAPI - Performance Application Programming Interface (PAPI) enables a host of other university performance tools using performance counters.
April 2012 v1 AMD Opteron™ 6200 Linux Tuning Guide 7.0 AMD Reference Material •• •• •• •• •• •• •• •• •• •• •• •• •• •• X86 Open64 Compilers Suite: http://developer.amd.com/tools/open64/ Using the x86 Open64 Compiler Suite: http://developer.amd.com/tools/open64/Documents/open64.html X86 Open64 4.2.5.1 Release Notes: http://developer.amd.com/tools/open64/assets/ReleaseNotes.txt AMD Developer Tools: http://developer.amd.com/tools/ AMD Libraries (ACML, LibM, etc.): http://developer.amd.
©2012 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Opteron, AMD-V, AMD Virtualization, and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners.