Specifications

20 Performance-Centric Compiler Switches Chapter 3
32035 Rev. 3.22 November 2007
Compiler Usage Guidelines for AMD64 Platforms
3.1.2 General Performance Switches
To get a program running, start by compiling and linking without optimization. Use the optimization
level -O0 or select -g to perform minimal optimization. At this level, you can debug a program easily
and isolate any coding errors exposed during porting to x86 or AMD64 platforms. Use option -tp (i.e.
target processor) to specify the target architecture. Options -tp k8-64 and -tp k8-64e result in the
generation of code supported on and optimized for AMD64 processors. Edition 7 supports AMD
Opteron quad-core processor with options -tp barcelona-64 to generate 64-bit code and -tp
barcelona to generate 32-bit code.
Note: The 64-bit PGI compiler can generate 32-bit binaries.
To get started quickly with optimization, with any PGI compiler use options -fast and -Mipa=fast.
For C++ programs, add -Minline=levels:10 --no_exceptions (C++ program compiled with
--no_exceptions will fail if the program uses exception handling). Beginning in Edition 7 the -fast
option became synonymous with the -fastsse option, and the optimizations performed by -fast in
previous releases were placed under the -nfast option.
Note: The -fastsse option is still necessary to compile 32 bit code.
Generally, further significant performance gains can be realized. However, individual optimizations
can sometimes cause slowdowns depending on coding style. Optimization flags most likely to further
improve performance are-O3, -Mpfi/-Mpfo, -Minline, and on targets with multiple processors
-Mconcur,
The --zc_eh option allows zero-cost exception handling for C++.
For C++ BASE optimization, use --zc_eh with -Mipa=fast,inline and -Msmartalloc=huge. The
huge flag enables the use of huge pages if the OS is configured to provide them.
3.1.3 Optimization Switches
In addition to the -tp (i.e., target processor) switch, the following list of switches may improve the
performance of the program. It is worth experimenting with these switches, but care must be used to
ensure performance improvements.
Local and Global Optimization using -O. Specify any of the following optimization level
(-Olevel) options.
-O0—(level-0) specifies no optimization. This optimization level generates a basic block for each
language statement. This is useful for debugging since there is a direct correlation between the
program text and the code generated.
-O1 (level-1) specifies local optimization. This optimization level performs scheduling of basic
blocks and allocates registers.
-O2 (level-2) specifies global optimization. This optimization level performs all level-one local
optimization as well as level-two global optimization.