Optimizing Itanium-Based Applications (May 2011)

Optimizing Itanium-Based Applications

Users can remove optimization time restrictions at +O2 and above by using the +Onolimit or

+Olimit=none option. This allows full optimization of large procedures, but can incur significant

compile time increases for very large procedures, especially those with large sequences of straight-line

code. If you are willing to tolerate longer compile times, +Onolimit can result in significant

performance improvements.

Users can limit the amount of time spent optimizing code to completely avoid non-linear compile times

using +Olimit or +Olimit=min.

limiting the size of optimized code

+O[no]size (default +Onosize)

The user can disable optimizations that greatly expand code size at +O2 and above using the +Osize

option. Most optimizations improve code speed and simultaneously reduce code size. However, some

optimizations can greatly increase code size. Loop unrolling is one such optimization, and is disabled

with +Osize.

+Osize also disables inlining. This option can help reduce instruction cache misses. You can use

+Osize with other optimization controls, such as +Onolimit and +Ofast.

controlling the scheduling model

+DS[blended|itanium2|montecito|poulson|native] (default +DSblended)

Different Itanium-based implementations can have vastly different resource constraints, latencies, and

other scheduling criteria. The optimizing scheduler can currently schedule for several Itanium-based

implementations: Intel Itanium 2, Montecito, and Poulson. The user can schedule code to run best on

each of these implementations by using +DSitanium2, +DSmontecito,, and +DSpoulson,

respectively. Additionally, use +DSmontecito to obtain the best schedule for Montvale and Tukwila.

Use +DSpoulson to schedule for the future post-Tukwila Poulson implementation.

However, users might also want to optimize code once and have it run reasonably well on different

implementations. The default setting, +DSblended, attempts to do this. Currently, +DSblended is a

combination of +DSmontecito and +DSpoulson. As new IPF implementations are released,

+DSblended scheduling will change so that code will run reasonably well on the different

implementations.

You can simply use +DSnative to schedule code fastest for the implementation on which you are

compiling. As new implementations of Itanium are released, additional targets will be added and the

schedule resulting from +DSblended will change.

controlling floating point optimization

+Ofltacc=[strict|default|limited|relaxed]

+O[no]fltacc

#pragma STDC FP_CONTRACT [ON/OFF/DEFAULT]

Controls optimizations on floating-point code, so that the expected accuracy of floating-point

computation is not violated. With +Ofltacc=strict (or its equivalent +Ofltacc), all

optimizations that can change result values are prohibited.

By default, or with +Ofltacc=default, the only value changing optimization allowed is synthesis

of contractions. This includes floating-point multiply add instructions, and its variants. While these

instructions might change the resulting value over a two-instruction multiply add sequence, the resulting

value is actually more accurate because it has not been subject to an intermediate rounding.