Optimizing Itanium-Based Applications (May 2011)

● Debugging correctness of code is maintained. Breakpoints behave as expected and variables have

expected values at breakpoints. See Section 14.27 (Debugging optimized code) in Debugging with

GDB[2] for more information on this topic.

level two

+O2 or –O

description:

● Performs Level 1 optimizations, plus optimizations performed over entire functions.

● Performs intra-module inlining with tuned down heuristics to guarantee fast compile times in

addition to potential performance gains.

● Performs global optimizations, code motion, and register promotion.

● Performs loop optimizations such as data prefetching (more aggressive than at level one), sum

reduction, scalar replacement, strength reduction, unrolling, rerolling, fusion, unswitching and post-

increment synthesis.

● Performs additional optimizations, including FMA synthesis and dead code elimination.

● Performs optimization of calls to certain library codes if the system headers for the appropriate

library calls are included. For example, inlining of calls to sqrt, sin, cos and certain calls to

memory copies and compares can occur. Commoning of library calls can also occur.

Additionally, the optimizer employs a suite of transformations that take advantage of key Itanium

architectural features to improve the instruction level parallelism of applications. For example, the

scheduler performs techniques such as predication, control speculation, and data speculation.

Predication allows control flow to be converted into conditionally-executed instructions that both

eliminates branch instructions and allows multiple execution paths to be executed simultaneously.

Speculation allows code to be executed earlier than it would be under the order specified by the

developer.

In order to perform these scheduling techniques (described in the previous paragraph) effectively and

efficiently, the code is divided into regions that are each optimized as a unit. Innermost loops are

software pipelined whenever possible, utilizing special branches and rotating registers for an efficient

schedule. Predication enables software pipelining of loops with control flow. Both types of speculation

are also supported for modulo scheduled loops.

This level of optimization limits the ability to debug the application. See Section 14.27 (Debugging

optimized code) in Debugging with GDB[2] for more information on this topic.

benefits:

● Significantly faster code than produced at Level 1, due to optimized code and better use of machine

resources and Itanium architectural features.

● Non-numeric applications can be improved by 50% or more.

● Loop intensive numeric applications achieve even greater speedups due to optimizations such as

more aggressive data prefetching and software pipelining.

level two -ipo

+O2 -ipo or –O -ipo

description: