Optimizing Itanium-Based Applications (May 2011)

Optimizing Itanium-Based Applications
23
compiler-generated performance advice
The compiler will emit performance-related advice when +wperfadvice[=1|2|3|4] is specified
(+wperfadvice is equivalent to +wperfadvice=2). The fewest, easiest to correct advice messages
are emitted at level 1. More suggestions are emitted with higher levels, and those emitted by levels 3 and 4
may require extensive or complicated source code changes to achieve performance benefits.
The scenarios currently detected by +wperfadvice include, but are not limited to:
Passing large structures by value instead of by reference
Lack of profile information and inability to perform profile-based optimizations
Frequently executed indirect function calls which may perform better as direct calls.
Possibly inadvertent use of #pragma optimize off
Frequently called routines that are not defined in the load module and cannot be inlined by the
compiler
Loops with constant trip counts that may be multi-versioned
Inability to pipeline loops due to recurrence restraints
putting it together with optimization option recipes
While there are many available compiler options, some of which are detailed in this document. However,
there are a few options that tend to provide big performance boosts for most applications.
Use optimization level 2 (-O or +O2) at a minimum (+O3 for floating-point applications).
Consider compiling with +O4 if not shipping archive libraries (if +O4 is not an option, consider using
–minshared, +Bprotected_def and +Oshortdata to attain some of the benefit).
Use PBO (profile-based optimization) for a potentially large improvement in performance (especially
for large commercial applications). PBO provides even bigger improvements on top of +O4.
Use +Ofast, which is safe and effective for the vast majority of programs.
For memory-intensive programs, use large pages via the +pd and +pi linker options or chatr(1).
For floating-point applications, as mentioned above, +O3 should be the minimum optimization level.
Additionally, +Ofltacc=relaxed and +FPD (both included in +Ofast) often provide large
improvements.