Optimizing Itanium-Based Applications (May 2011)
4
● Debugging correctness of code is maintained. Breakpoints behave as expected and variables have
expected values at breakpoints. See Section 14.27 (Debugging optimized code) in Debugging with
GDB[2] for more information on this topic.
level two
+O2 or –O
description:
● Performs Level 1 optimizations, plus optimizations performed over entire functions.
● Performs intra-module inlining with tuned down heuristics to guarantee fast compile times in
addition to potential performance gains.
● Performs global optimizations, code motion, and register promotion.
● Performs loop optimizations such as data prefetching (more aggressive than at level one), sum
reduction, scalar replacement, strength reduction, unrolling, rerolling, fusion, unswitching and post-
increment synthesis.
● Performs additional optimizations, including FMA synthesis and dead code elimination.
● Performs optimization of calls to certain library codes if the system headers for the appropriate
library calls are included. For example, inlining of calls to sqrt, sin, cos and certain calls to
memory copies and compares can occur. Commoning of library calls can also occur.
Additionally, the optimizer employs a suite of transformations that take advantage of key Itanium
architectural features to improve the instruction level parallelism of applications. For example, the
scheduler performs techniques such as predication, control speculation, and data speculation.
Predication allows control flow to be converted into conditionally-executed instructions that both
eliminates branch instructions and allows multiple execution paths to be executed simultaneously.
Speculation allows code to be executed earlier than it would be under the order specified by the
developer.
In order to perform these scheduling techniques (described in the previous paragraph) effectively and
efficiently, the code is divided into regions that are each optimized as a unit. Innermost loops are
software pipelined whenever possible, utilizing special branches and rotating registers for an efficient
schedule. Predication enables software pipelining of loops with control flow. Both types of speculation
are also supported for modulo scheduled loops.
This level of optimization limits the ability to debug the application. See Section 14.27 (Debugging
optimized code) in Debugging with GDB[2] for more information on this topic.
benefits:
● Significantly faster code than produced at Level 1, due to optimized code and better use of machine
resources and Itanium architectural features.
● Non-numeric applications can be improved by 50% or more.
● Loop intensive numeric applications achieve even greater speedups due to optimizations such as
more aggressive data prefetching and software pipelining.
level two -ipo
+O2 -ipo or –O -ipo
description: