Optimizing Itanium-Based Applications (May 2011)
Optimizing Itanium-Based Applications
3
introduction
The HP Itanium-based optimizer transforms code so that it runs more efficiently on Itanium-based
HP-UX systems. The optimizer can dramatically improve application performance. In addition, compile
time and memory resources increase with each higher level of optimization due to the increasingly complex
analysis that is performed.
This document discusses the following topics:
● Six levels of optimization
● Interprocedural optimizations
● High-level loop optimizations
● Advanced optimization options and pragmas
● Profile-based optimization
● Compiler-generated performance advice
● Putting it all together with optimization options recipes
Note that this version applies to the A.06.26 (AR1109) release of the HP compilers. For an overview of the
HP compiler technology, see HP Compilers for HP Integrity Servers[1].
six levels of optimization
There are six levels of optimization. Each level is a superset of the preceding level. Additional parameters
allow the user to control the aggressiveness of optimization, compile time, and the size of the resulting
executable.
level zero
+O0
description:
● Simple register assignment.
● Trivial scheduling (one instruction per cycle, one bundle per cycle).
● Should almost never be used.
benefits:
Fastest compile time; however, use of this optimization level is strongly discouraged due to the
poor quality of the resulting code.
level one
+O1 (default)
description:
● Local optimizations that optimize over a single basic block, including common subexpressions
elimination, constant folding, and load-store elimination.
● Performs data prefetching of simple array traversals.
● More sophisticated instruction scheduling.
● Register promotion of some scalar locals and C/C++ scalar formals.
● In C++, inlining of calls within a translation unit.
benefits:
● Produces much faster code than +O0, but faster compile time than +O2.