Specifications

Chapter 3 Performance-Centric Compiler Switches 37
Compiler Usage Guidelines for AMD64 Platforms
32035 Rev. 3.22 November 2007
2. Run the executable produced in Step 1. Running the executable generates several files with profile
information (*.dyn and *.dpi).
3. Recompile the program with the -Qprof_use switch. It is recommended to also use the
-Qipo switch in this stage.
-Oi-. For programs with many calls to memory-related library routines (such as, memset and
memcpy), using the -Oi- switch may improve performance for Intel compiler versions 7.1 and 8.0.
This switch is not recommended for version 9.1.
-Qunroll[n]. This switch sets the maximum number of times to unroll a loop. Experiment with
values 1–4. For scientific programs, a particular value may slightly improve performance.
-Qansi-alias. Try this switch if the program strictly conforms to the ISO C99 standard. If the
program adheres to the standard, this switch allows the compiler to perform aggressive optimizations.
3.12 Microsoft
®
Compilers (32-Bit) for Microsoft
®
Windows
®
The 32-bit Microsoft compilers can be installed and run on 32-bit Microsoft Windows and 64-bit
Microsoft Windows on AMD Athlon™ 64, AMD Opteron™, and AMD Family 10h processors. The
current version is Visual Studio 2008. All the options below apply to this version.
3.12.1 Invocation Command
The cl command invokes the Microsoft C/C++ compiler.
3.12.2 Generic Performance Switches
The /O2, /GL, /Oy, and /fp:fast switches almost always result in improved performance. The /O2
switch turns on several general optimizations. The /GL switch enables whole-program IPA and /Oy
allows the compiler to use frame pointer register as a general register which usually result in better
performance. Using /fp:fast allows the compiler to use fast math library routines with extensive error
checking turned off. Using /fp:fast also allows the compiler to adhere to a fast but less predictable
floating point model in general. However, applications that require high precision should avoid using
this switch. For code that may be sensitive to cache size, consider using the /O1 compiler switch. /O1
will generate smaller code at the possible expense of instruction execution speed. However, the
potential performance improvement due to smaller code footprint may be of more benefit than any
loss due to slower instructions.
3.12.3 Other Switches
In addition to the /O2, /GL, /Oy, and /fp:fast switches, the following list of switches may improve the
performance of the program. It is worth experimenting with these switches.