Optimizing Itanium-Based Applications (May 2011)
6
● Better alias information and inlining improves and enables additional loop transformations.
interprocedural optimizations with -ipo
The HP high level optimizer contains an interprocedural optimizer, a high level loop optimizer, and a scalar
optimizer.
The interprocedural optimizer is enabled with the option -ipo at optimization levels two or higher (e.g. +O2
-ipo). Optimization level four (option +O4) implies -ipo.
The high level loop optimizer is fully enabled at optimization levels three or higher (options +O3 and +O4)
and performs optimizations such as loop interchange, loop distribution and loop fusion. Limited high level
loop optimizations are performed at +O2.
The high level scalar optimizer is enabled along with the other high level optimizations and performs
expression simplification and canonicalization, dead code removal, copy propagation, constant
propagation, partial redundancy elimination, partial dead store elimination, as well as control flow
optimizations and basic block cloning.
This chapter focuses on the benefits of the interprocedural optimizer.
The option -ipo can be used to compile some or all of an application’s source files. Compiling only some
modules with -ipo enables intermodule optimizations between those files. In this mode, only parts of the
application are analyzed during IPO by the compiler and therefore the compiler has to make pessimistic
assumptions about the rest of the application. This can result in missing out on some optimization
opportunities.
For highest performance, it is benefitial to compile all of an application’s source files with -ipo; this is
called whole program mode. In this mode, the compiler can perform precise analysis of an application,
potentially resulting in better performance.
The high level optimizer makes use of PBO information and is more effective when used in combination
with PBO (option +Oprofile=use), for example, PBO data improves function inlining. PBO data can also
reveal the most likely callee at an indirect call size, allowing the high level optimizer to transform the
indirect call into a test and a direct call.
Application performance currently benefits from interprocedural optimization in the following ways:
Interprocedural analysis of memory references and function arguments enables and improves many
optimizations, for example, it yields several additional opportunities for optimizations in the low level
optimizer, including register promotion.
Consider this example:
void foo( int *x, int *y )
{
... = *x; // load 1
*y = ... // store 1
... = *x; // load 2
}
Without any additional knowledge about the properties of the pointers x and y, the compiler has to
issue a second load instruction (load 2), since the store (store 1) may overwrite the content of the
pointer x.
If, as a result of interprocedural analysis, the compiler is able to determine that x and y never alias
(point to the same memory location), the compiler can promote the value of *x into a register and just
reuse this register (load 2).